Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2007-02-01 Thread Bruce Momjian

Added to TODO:

   o Allow column display reordering by recording a display,
 storage, and permanent id for every column?

 http://archives.postgresql.org/pgsql-hackers/2006-12/msg00782.php



---

Jim C. Nasby wrote:
 On Thu, Dec 21, 2006 at 11:43:27AM -0500, Tom Lane wrote:
  Andrew Dunstan [EMAIL PROTECTED] writes:
   Tom Lane wrote:
   You could make a case that we need *three* numbers: a permanent column
   ID, a display position, and a storage position.
  
   Could this not be handled by some catalog fixup after an add/drop? If we 
   get the having 3 numbers you will almost have me convinced that this 
   might be too complicated after all.
  
  Actually, the more I think about it the more I think that 3 numbers
  might be the answer.  99% of the code would use only the permanent ID.
  Display position would be used in *exactly* one place, namely while
  expanding SELECT foo.* --- I can't think of any other part of the
  backend that would care about it.  (Obviously, client-side code such
  as psql's \d would use it too.)  Use of storage position could be
  localized into a few low-level tuple access functions, probably.
  
  The problems we've been having with the concept stem precisely from
  trying to misuse either display or storage position as a permanent ID.
  That's fine as long as it actually is permanent, but as soon as you
  want to change it then you have problems.  We should all understand
  this perfectly well from a database theory standpoint: pg_attribute
  has to have a persistent primary key.  (attrelid, attnum) is that key,
  and we can't go around altering a column's attnum without creating
  problems for ourselves.
 
 Is there enough consensus on this to add it to the TODO?
 -- 
 Jim Nasby[EMAIL PROTECTED]
 EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-29 Thread Jim C. Nasby
On Thu, Dec 21, 2006 at 11:43:27AM -0500, Tom Lane wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  You could make a case that we need *three* numbers: a permanent column
  ID, a display position, and a storage position.
 
  Could this not be handled by some catalog fixup after an add/drop? If we 
  get the having 3 numbers you will almost have me convinced that this 
  might be too complicated after all.
 
 Actually, the more I think about it the more I think that 3 numbers
 might be the answer.  99% of the code would use only the permanent ID.
 Display position would be used in *exactly* one place, namely while
 expanding SELECT foo.* --- I can't think of any other part of the
 backend that would care about it.  (Obviously, client-side code such
 as psql's \d would use it too.)  Use of storage position could be
 localized into a few low-level tuple access functions, probably.
 
 The problems we've been having with the concept stem precisely from
 trying to misuse either display or storage position as a permanent ID.
 That's fine as long as it actually is permanent, but as soon as you
 want to change it then you have problems.  We should all understand
 this perfectly well from a database theory standpoint: pg_attribute
 has to have a persistent primary key.  (attrelid, attnum) is that key,
 and we can't go around altering a column's attnum without creating
 problems for ourselves.

Is there enough consensus on this to add it to the TODO?
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-22 Thread Zeugswetter Andreas ADI SD

  You could make a case that we need *three* numbers: a permanent
column
  ID, a display position, and a storage position.
 
  Could this not be handled by some catalog fixup after an add/drop?
If we 
  get the having 3 numbers you will almost have me convinced that this

  might be too complicated after all.
 
 Actually, the more I think about it the more I think that 3 numbers
 might be the answer.  99% of the code would use only the permanent ID.

I am still of the opinion, that the system tables as such are too
visible
to users and addon developers as to change the meaning of attnum.

And I don't quite see what the point is. To alter a table's column you
need
an exclusive lock, and plan invalidation (or are you intending to
invalidate only
plans that reference * ?). Once there you can just as well fix the
numbering.
Yes, it is more work :-(

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Zeugswetter Andreas ADI SD

 I'm not sure how much you can do with typing. Things like heap_getattr
 are macros, and thus untyped. Most places use attr as an index to an
 array, which also can't be type checked.
 
 If you switched everything over to inline functions you might 
 get it to
 work, but that's about it.
 
 IMHO the best solution is to offset the logical numbers by some
 constant...

Um, surely you meant offset the physical numbers. Imho the logical
numbers
need to stay 1-n, because those numbers are used way more often and are 
more user visible than the physical.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Andrew Dunstan

Zeugswetter Andreas ADI SD wrote:

I'm not sure how much you can do with typing. Things like heap_getattr
are macros, and thus untyped. Most places use attr as an index to an
array, which also can't be type checked.

If you switched everything over to inline functions you might 
get it to

work, but that's about it.

IMHO the best solution is to offset the logical numbers by some
constant...



Um, surely you meant offset the physical numbers. Imho the logical
numbers
need to stay 1-n, because those numbers are used way more often and are 
more user visible than the physical.



  


I don't think we should expose the offset to user view at all - this is 
just for internal use, no?


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout
On Thu, Dec 21, 2006 at 10:27:12AM -0500, Andrew Dunstan wrote:
 Um, surely you meant offset the physical numbers. Imho the logical
 numbers
 need to stay 1-n, because those numbers are used way more often and are 
 more user visible than the physical.
 
 
   
 
 I don't think we should expose the offset to user view at all - this is 
 just for internal use, no?

The thing is, physical index numbers has meaning, the logical index
number does not. In a view definition we're going to store the physical
index, not the logical one, for example. We don't want rearranging
columns to invalidate view definitions or plans.

The number of places needing the logical index are not that man,
relativelyy, and given it has no intrinsic meaning, it's better to give
it a numeric value which is obviously abritrary (like 10001).

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 The thing is, physical index numbers has meaning, the logical index
 number does not. In a view definition we're going to store the physical
 index, not the logical one, for example.

Really?  To me that's one of a large number of questions that are
unresolved about how we'd do this.  You can make a case for either
choice in quite a number of places.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Zeugswetter Andreas ADI SD

  I don't think we should expose the offset to user view at all - this
is 
  just for internal use, no?
 
 The thing is, physical index numbers has meaning, the logical index
 number does not. In a view definition we're going to store the
physical
 index, not the logical one, for example. We don't want rearranging
 columns to invalidate view definitions or plans.

I think we lack a definition here:

logical number: the order of columns when doing select *
physical number:the position inside the heap tuple (maybe with
offset)

All views and plans and index definitions and most everyting else 
needs to reference the logical number.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout
On Thu, Dec 21, 2006 at 10:50:59AM -0500, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  The thing is, physical index numbers has meaning, the logical index
  number does not. In a view definition we're going to store the physical
  index, not the logical one, for example.
 
 Really?  To me that's one of a large number of questions that are
 unresolved about how we'd do this.  You can make a case for either
 choice in quite a number of places.

Can we? For anything of any permenence (view definitions, rules,
compiled functions, plans, etc) you're going to want the physical
number, for the same reason we store the oids of functions and tables.

I can't see the optimiser or executor caring about logical numbers
either. The planner would use it only when looking up column names.

The logical number isn't going to be used much I think. You can go from
column name to physical index directly, without ever looking up the
logical index. That's why I'm suggesting adding some large constant to
the logical numbers, since they're going to be less used in general.

Where do you think we have the choice?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout
On Thu, Dec 21, 2006 at 05:06:53PM +0100, Zeugswetter Andreas ADI SD wrote:
  The thing is, physical index numbers has meaning, the logical index
  number does not. In a view definition we're going to store the
  physical index, not the logical one, for example. We don't want
  rearranging columns to invalidate view definitions or plans.
 
 I think we lack a definition here:
 
 logical number:   the order of columns when doing select *
 physical number:  the position inside the heap tuple (maybe with
 offset)
 
 All views and plans and index definitions and most everyting else 
 needs to reference the logical number.

Huh? If I have an index on the first two columns of a table, it's going
to refernce columns 1 and 2.

If you alter the table to put a column in front of those two, the new
column will be physical 3, logical 1.

If the index references logical numbers, the index has just been
broken. If the index references physical numbers, everything works
without changes.

Same with views, if you use logical numbers you have to rebuild the
view each time. Why bother, when physical numbers work and don't have
that problem?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Thu, Dec 21, 2006 at 10:50:59AM -0500, Tom Lane wrote:
 Really?  To me that's one of a large number of questions that are
 unresolved about how we'd do this.  You can make a case for either
 choice in quite a number of places.

 Can we? For anything of any permenence (view definitions, rules,
 compiled functions, plans, etc) you're going to want the physical
 number, for the same reason we store the oids of functions and tables.

Not if we intend to rearrange the physical numbers during column
add/drop to provide better packing.

You could make a case that we need *three* numbers: a permanent column
ID, a display position, and a storage position.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Zeugswetter Andreas ADI SD

   The thing is, physical index numbers has meaning, the logical
index
   number does not. In a view definition we're going to store the
   physical index, not the logical one, for example. We don't want
   rearranging columns to invalidate view definitions or plans.
  
  I think we lack a definition here:
  
  logical number: the order of columns when doing select *
  physical number:the position inside the heap tuple (maybe with
  offset)
  
  All views and plans and index definitions and most everyting else 
  needs to reference the logical number.
 
 Huh? If I have an index on the first two columns of a table, 
 it's going
 to refernce columns 1 and 2.
 
 If you alter the table to put a column in front of those two, the new
 column will be physical 3, logical 1.

No, you change pg_index to now contain 2,3.

 If the index references logical numbers, the index has just been
 broken. If the index references physical numbers, everything works
 without changes.

yup, sinval

 Same with views, if you use logical numbers you have to rebuild the
 view each time. Why bother, when physical numbers work and don't have
 that problem?

Because it would imho be a nightmare to handle ...

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Andrew Dunstan

Tom Lane wrote:

Martijn van Oosterhout kleptog@svana.org writes:
  

On Thu, Dec 21, 2006 at 10:50:59AM -0500, Tom Lane wrote:


Really?  To me that's one of a large number of questions that are
unresolved about how we'd do this.  You can make a case for either
choice in quite a number of places.
  


  

Can we? For anything of any permenence (view definitions, rules,
compiled functions, plans, etc) you're going to want the physical
number, for the same reason we store the oids of functions and tables.



Not if we intend to rearrange the physical numbers during column
add/drop to provide better packing.

You could make a case that we need *three* numbers: a permanent column
ID, a display position, and a storage position.


  


Could this not be handled by some catalog fixup after an add/drop? If we 
get the having 3 numbers you will almost have me convinced that this 
might be too complicated after all.


cheers

andrew


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 You could make a case that we need *three* numbers: a permanent column
 ID, a display position, and a storage position.

 Could this not be handled by some catalog fixup after an add/drop? If we 
 get the having 3 numbers you will almost have me convinced that this 
 might be too complicated after all.

Actually, the more I think about it the more I think that 3 numbers
might be the answer.  99% of the code would use only the permanent ID.
Display position would be used in *exactly* one place, namely while
expanding SELECT foo.* --- I can't think of any other part of the
backend that would care about it.  (Obviously, client-side code such
as psql's \d would use it too.)  Use of storage position could be
localized into a few low-level tuple access functions, probably.

The problems we've been having with the concept stem precisely from
trying to misuse either display or storage position as a permanent ID.
That's fine as long as it actually is permanent, but as soon as you
want to change it then you have problems.  We should all understand
this perfectly well from a database theory standpoint: pg_attribute
has to have a persistent primary key.  (attrelid, attnum) is that key,
and we can't go around altering a column's attnum without creating
problems for ourselves.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Martijn van Oosterhout
On Thu, Dec 21, 2006 at 11:15:38AM -0500, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  Can we? For anything of any permenence (view definitions, rules,
  compiled functions, plans, etc) you're going to want the physical
  number, for the same reason we store the oids of functions and tables.
 
 Not if we intend to rearrange the physical numbers during column
 add/drop to provide better packing.

Urk! If that's what people are suggesting, I'd run away very quickly.
Getting better packing during table create is a nice idea, but
preserving it across add/drop column is just... evil.

Run CLUSTER is you want that, I was expecting add/drop to be a simple
catalog change, nothing more.

 You could make a case that we need *three* numbers: a permanent column
 ID, a display position, and a storage position.

That's just way too complicated IMHO. It add's extra levels of
indirection all over the place.

I was envisiging the physical number to be fixed and immutable (ie
storage position = permanent position).

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 I was envisiging the physical number to be fixed and immutable (ie
 storage position = permanent position).

There are two different problems being discussed here, and one of them
is insoluble if we take that position: people would like the system to
automatically lay out tables to minimize alignment overhead and access
costs (eg, put fixed-width columns first).  This is not the same as
I would like to change the display column order.

It's true that for an ADD COLUMN that doesn't already force a table
rewrite, forcing one to improve packing is probably bad.  My thought
would be that we leave the column storage order alone if we don't have
to rewrite the table ... but any rewriting variant of ALTER TABLE could
optimize the storage order while it was at it.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Andrew Dunstan

Tom Lane wrote:

Actually, the more I think about it the more I think that 3 numbers
might be the answer.  99% of the code would use only the permanent ID.
Display position would be used in *exactly* one place, namely while
expanding SELECT foo.* --- I can't think of any other part of the
backend that would care about it.


Insert without a column list will need the logical ordering, I think. 
Also use of like foo in a create table statement. I'm not dead sure 
there aren't one or two others lurking. But I agree that the number is 
small.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Actually, the more I think about it the more I think that 3 numbers
 might be the answer.  99% of the code would use only the permanent ID.

Don't we already have such a permanent number -- just one we don't use
anywhere in the data model? Namely the oid of the pg_attribute entry. It's
actually a bit odd that we don't use it since we use the oid of just about
every other system catalog record as the primary key.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-21 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Actually, the more I think about it the more I think that 3 numbers
 might be the answer.  99% of the code would use only the permanent ID.

 Don't we already have such a permanent number -- just one we don't use
 anywhere in the data model? Namely the oid of the pg_attribute entry.

Nope, because pg_attribute hasn't got OIDs.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Russell Smith

Tom Lane wrote:

Stephen Frost [EMAIL PROTECTED] writes:
  

Force references to go through macros which implement the lookup for the
appropriate type?  ie: LOGICAL_COL(table_oid,2) vs.
PHYSICAL_COL(table_oid,1)  Perhaps that's too simplistic.



It doesn't really address the question of how you know which one to
use at any particular line of code; or even more to the point, what
mechanism will warn you if you use the wrong one.

My gut feeling about this is that we could probably enforce such a
distinction if we were using C++, but while coding in C I have no
confidence in it.  (And no, that's not a vote to move to C++ ...)
  

What about a comprimise...

The 8.1 documentation for ALTER TABLE states the following.

Adding a column with a non-null default or changing the type of an 
existing column will require the entire table to be rewritten. This may 
take a significant amount of time for a large table; and it will 
temporarily require double the disk space.



Now, we are rewriting the table from scratch anyway, the on disk format 
is changing.  What is stopping us from switching the column order at the 
same time.  The only thing I can think is that the catalogs will need 
more work to update them.  It's a middle sized price to pay for being 
able to reorder the columns in the table.  One of the problems I have is 
wanting to add a column in the middle of the table, but FK constraints 
stop me dropping the table to do the reorder.  If ALTER TABLE would let 
me stick it in the middle and rewrite the table on disk, I wouldn't 
care.  It's likely that I would be rewriting the table anyway.  And by 
specifying AT POSITION, or BEFORE/AFTER you know for big tables it's 
going to take a while.


Not that I'm able to code this at all, but I'm interested in feedback on 
this option.


Regards

Russell Smith

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


  



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Martijn van Oosterhout
On Tue, Dec 19, 2006 at 11:29:24PM -0500, Tom Lane wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  Force references to go through macros which implement the lookup for the
  appropriate type?  ie: LOGICAL_COL(table_oid,2) vs.
  PHYSICAL_COL(table_oid,1)  Perhaps that's too simplistic.
 
 It doesn't really address the question of how you know which one to
 use at any particular line of code; or even more to the point, what
 mechanism will warn you if you use the wrong one.

There's one method: Set it up so that when you create a table, it
randomizes the order of the fields on disk. Obviously for production
this isn't smart, but it would test the code a lot. Though in the
regression tests many tables only have one column so they won't be
affected.

If we had unit tests you could create a function called
heap_mangle_tuple which simply does physical reordering but logically
does nothing and feed it in at each point to check the code is
invarient.

Another approach is to number logical columns starting at 1000. This
would mean that at a glance you could tell what you're talking about.
And code using the wrong one will do something obviously bad. If
performance is an issue you could only enable the offset for
--enable-assert builds.

Personally I like this approach because it would encourage everyone
to use the macro to access the fields, since not doing so will place a
constant in an obvious place. It's also trivial for the system to
check.

Personally I'm unsure of the scope of the problem. AFAICS there's
hardly anywhere that would use physical offsets...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Peter Eisentraut
Am Mittwoch, 20. Dezember 2006 04:44 schrieb Tom Lane:
 If you can show me a reasonably bulletproof or machine-checkable way to
 keep the two kinds of column numbers distinct, I'd be all for it.

The only way I can see is to make the domains of the numbers distinct.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Martijn van Oosterhout
On Wed, Dec 20, 2006 at 07:20:14AM -0600, Kenneth Marshall wrote:
 On Wed, Dec 20, 2006 at 01:26:59PM +0100, Peter Eisentraut wrote:
  Am Mittwoch, 20. Dezember 2006 04:44 schrieb Tom Lane:
   If you can show me a reasonably bulletproof or machine-checkable way to
   keep the two kinds of column numbers distinct, I'd be all for it.
  
  The only way I can see is to make the domains of the numbers distinct.
  
 Negative vs. positive numbers?

Negative is used by system columns. Just adding some large constant
(say 1) should be enough.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Peter Eisentraut
Am Mittwoch, 20. Dezember 2006 14:20 schrieb Kenneth Marshall:
 On Wed, Dec 20, 2006 at 01:26:59PM +0100, Peter Eisentraut wrote:
  Am Mittwoch, 20. Dezember 2006 04:44 schrieb Tom Lane:
   If you can show me a reasonably bulletproof or machine-checkable way to
   keep the two kinds of column numbers distinct, I'd be all for it.
 
  The only way I can see is to make the domains of the numbers distinct.

 Negative vs. positive numbers?

That would be an obvious choice, but negative column numbers are already in 
use for system columns.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Andrew Dunstan

Martijn van Oosterhout wrote:

On Wed, Dec 20, 2006 at 07:20:14AM -0600, Kenneth Marshall wrote:
  

On Wed, Dec 20, 2006 at 01:26:59PM +0100, Peter Eisentraut wrote:


Am Mittwoch, 20. Dezember 2006 04:44 schrieb Tom Lane:
  

If you can show me a reasonably bulletproof or machine-checkable way to
keep the two kinds of column numbers distinct, I'd be all for it.


The only way I can see is to make the domains of the numbers distinct.

  

Negative vs. positive numbers?



Negative is used by system columns. Just adding some large constant
(say 1) should be enough.

Have a nice day,
  


Or we could divide the positive number space in two, by starting at 2^14 
(attnums are int2). Then a simple bitmask test would work to distinguish 
them.



cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  Force references to go through macros which implement the lookup for the
  appropriate type?  ie: LOGICAL_COL(table_oid,2) vs.
  PHYSICAL_COL(table_oid,1)  Perhaps that's too simplistic.
 
 It doesn't really address the question of how you know which one to
 use at any particular line of code; or even more to the point, what
 mechanism will warn you if you use the wrong one.

That'd be the point of doing the typing, you then declare functions as
accepting the type and then if someone passes the wrong type to a
function the compiler will complain.  Inside of a particular function it
would hopefully be easier to keep it clear.  I'd think that most
functions would deal with one type or the other (which would be declared
in the arguments or in the local variables) and that functions which
have to deal with both would be able to keep them straight.

 My gut feeling about this is that we could probably enforce such a
 distinction if we were using C++, but while coding in C I have no
 confidence in it.  (And no, that's not a vote to move to C++ ...)

I need to go research what Linux does for this because aiui it's
pretty good about being able to enforce better type-checking than the
stock C types.  The only downside is that I *think* it might be a
GCC-only thing.  In that case I'd think we would still use it but build
some macros which essentially disable it for non-GCC compilers.  As a
mainly-for-developers compile-time check I think as long as a build-farm
member is running GCC and complaining when there are errors (and it can
be disabled on non-GCC compilers) we won't lose any portability from it.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Andrew Dunstan

Russell Smith wrote:

Tom Lane wrote:

Stephen Frost [EMAIL PROTECTED] writes:
 
Force references to go through macros which implement the lookup for 
the

appropriate type?  ie: LOGICAL_COL(table_oid,2) vs.
PHYSICAL_COL(table_oid,1)  Perhaps that's too simplistic.



It doesn't really address the question of how you know which one to
use at any particular line of code; or even more to the point, what
mechanism will warn you if you use the wrong one.

My gut feeling about this is that we could probably enforce such a
distinction if we were using C++, but while coding in C I have no
confidence in it.  (And no, that's not a vote to move to C++ ...)
  

What about a comprimise...

The 8.1 documentation for ALTER TABLE states the following.

Adding a column with a non-null default or changing the type of an 
existing column will require the entire table to be rewritten. This 
may take a significant amount of time for a large table; and it will 
temporarily require double the disk space.



Now, we are rewriting the table from scratch anyway, the on disk 
format is changing.  What is stopping us from switching the column 
order at the same time.  The only thing I can think is that the 
catalogs will need more work to update them.  It's a middle sized 
price to pay for being able to reorder the columns in the table.  One 
of the problems I have is wanting to add a column in the middle of the 
table, but FK constraints stop me dropping the table to do the 
reorder.  If ALTER TABLE would let me stick it in the middle and 
rewrite the table on disk, I wouldn't care.  It's likely that I would 
be rewriting the table anyway.  And by specifying AT POSITION, or 
BEFORE/AFTER you know for big tables it's going to take a while.




This isn't really a compromise. Remember that this discussion started 
with consideration of optimal record layout (minimising space use by 
reducing or eliminating alignment padding). The above proposal really 
does nothing for that.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Kenneth Marshall
On Wed, Dec 20, 2006 at 01:26:59PM +0100, Peter Eisentraut wrote:
 Am Mittwoch, 20. Dezember 2006 04:44 schrieb Tom Lane:
  If you can show me a reasonably bulletproof or machine-checkable way to
  keep the two kinds of column numbers distinct, I'd be all for it.
 
 The only way I can see is to make the domains of the numbers distinct.
 
Negative vs. positive numbers?

Ken

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Stephen Frost
* Andrew Dunstan ([EMAIL PROTECTED]) wrote:
 This isn't really a compromise. Remember that this discussion started 
 with consideration of optimal record layout (minimising space use by 
 reducing or eliminating alignment padding). The above proposal really 
 does nothing for that.

While I agree that's how the discussion started the column ordering
issue can stand on its own and any proposal which provides that feature
should be considered.  I don't think we should throw out the
rewrite-the-table idea because it doesn't solve other problems.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Richard Huxton

Andrew Dunstan wrote:


Or we could divide the positive number space in two, by starting at 2^14 
(attnums are int2). Then a simple bitmask test would work to distinguish 
them.


Perhaps divide-by-four, then it would be possible to have calculated 
columns (as mentioned recently on one of the lists). In particular, that 
would let you have FK constraints with a constant as part of the key.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 I assume space waste will be mostly fixed when we have 0/1 byte headers
 for varlena data types.

Hardly.  int float timestamp etc types will all still have alignment issues.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Martijn van Oosterhout
On Wed, Dec 20, 2006 at 09:15:05AM -0500, Stephen Frost wrote:
  It doesn't really address the question of how you know which one to
  use at any particular line of code; or even more to the point, what
  mechanism will warn you if you use the wrong one.
 
 That'd be the point of doing the typing, you then declare functions as
 accepting the type and then if someone passes the wrong type to a
 function the compiler will complain.  Inside of a particular function it
 would hopefully be easier to keep it clear.  I'd think that most
 functions would deal with one type or the other (which would be declared
 in the arguments or in the local variables) and that functions which
 have to deal with both would be able to keep them straight.

I'm not sure how much you can do with typing. Things like heap_getattr
are macros, and thus untyped. Most places use attr as an index to an
array, which also can't be type checked.

If you switched everything over to inline functions you might get it to
work, but that's about it.

IMHO the best solution is to offset the logical numbers by some
constant...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-20 Thread Russell Smith

Andrew Dunstan wrote:

Russell Smith wrote:

Tom Lane wrote:

Stephen Frost [EMAIL PROTECTED] writes:
 
Force references to go through macros which implement the lookup 
for the

appropriate type?  ie: LOGICAL_COL(table_oid,2) vs.
PHYSICAL_COL(table_oid,1)  Perhaps that's too simplistic.



It doesn't really address the question of how you know which one to
use at any particular line of code; or even more to the point, what
mechanism will warn you if you use the wrong one.

My gut feeling about this is that we could probably enforce such a
distinction if we were using C++, but while coding in C I have no
confidence in it.  (And no, that's not a vote to move to C++ ...)
  

What about a comprimise...

The 8.1 documentation for ALTER TABLE states the following.

Adding a column with a non-null default or changing the type of an 
existing column will require the entire table to be rewritten. This 
may take a significant amount of time for a large table; and it will 
temporarily require double the disk space.



Now, we are rewriting the table from scratch anyway, the on disk 
format is changing.  What is stopping us from switching the column 
order at the same time.  The only thing I can think is that the 
catalogs will need more work to update them.  It's a middle sized 
price to pay for being able to reorder the columns in the table.  One 
of the problems I have is wanting to add a column in the middle of 
the table, but FK constraints stop me dropping the table to do the 
reorder.  If ALTER TABLE would let me stick it in the middle and 
rewrite the table on disk, I wouldn't care.  It's likely that I would 
be rewriting the table anyway.  And by specifying AT POSITION, or 
BEFORE/AFTER you know for big tables it's going to take a while.




This isn't really a compromise. Remember that this discussion started 
with consideration of optimal record layout (minimising space use by 
reducing or eliminating alignment padding). The above proposal really 
does nothing for that.


cheers

andrew


This is partly true.  If you have the ability to rewrite the table and 
put columns in a specific order you can manually minimize the 
alignment padding.  However that will probably produce a table that is 
not in the logical order you would like.  I still see plenty of use case 
for both my initial case as the alignment padding case, even without 
logical layout being different to disk layout.


Also there has been a large about of discussion on performance relating 
to having firm numbers for proposals for different compiler options.  Do 
anybody have tested numbers, and known information about where/how you 
can eliminate padding by column ordering?  Tom suggests in this thread 
that lots of types have padding issues, so how much is it really going 
to buy us space wise if we re-order the table in optimal format.  What 
is the optimal ordering to reduce disk usage?


Russell.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-19 Thread Andrew Dunstan

Gregory Stark wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:

  

I'm not a big fan of ordering columns to optimise record layout, except in the
most extreme cases (massive DW type apps). I think visible column order should
be logical, not governed by physical considerations.



Well as long as we're talking shoulds the database should take care of this
for you anyways.

  


Sure, but the only sane way I can think of to do that would be have 
separate logical and physical orderings, with a map between the two. I 
guess we'd need to see what the potential space savings would be and 
establish what the processing overhead would be, before considering it. 
One side advantage would be that it would allow us to do the often 
requested add column at position x.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-19 Thread Martijn van Oosterhout
On Tue, Dec 19, 2006 at 10:48:41AM -0500, Andrew Dunstan wrote:
 Sure, but the only sane way I can think of to do that would be have 
 separate logical and physical orderings, with a map between the two. I 
 guess we'd need to see what the potential space savings would be and 
 establish what the processing overhead would be, before considering it. 
 One side advantage would be that it would allow us to do the often 
 requested add column at position x.

A patch to allow seperate physical and logical orderings was submitted
and rejected. Unless something has changed on that front, any
discussion in this direction isn't really useful.

Once this is possible it would allow a lot of simple savings. For
example, shifting all fixed width fields to the front means they can
all be accessed without looping through the previous columns, for
example.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-19 Thread Robert Treat
On Tuesday 19 December 2006 11:25, Martijn van Oosterhout wrote:
 On Tue, Dec 19, 2006 at 10:48:41AM -0500, Andrew Dunstan wrote:
  Sure, but the only sane way I can think of to do that would be have
  separate logical and physical orderings, with a map between the two. I
  guess we'd need to see what the potential space savings would be and
  establish what the processing overhead would be, before considering it.
  One side advantage would be that it would allow us to do the often
  requested add column at position x.

 A patch to allow seperate physical and logical orderings was submitted
 and rejected. Unless something has changed on that front, any
 discussion in this direction isn't really useful.


The patch was rejected on technical means, and the author decided it was too 
much work to finish it.  If someone wanted to try and complete that work I 
don't think anyone would stand against it. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-19 Thread Tom Lane
Robert Treat [EMAIL PROTECTED] writes:
 On Tuesday 19 December 2006 11:25, Martijn van Oosterhout wrote:
 A patch to allow seperate physical and logical orderings was submitted
 and rejected. Unless something has changed on that front, any
 discussion in this direction isn't really useful.

 The patch was rejected on technical means, and the author decided it was too 
 much work to finish it.  If someone wanted to try and complete that work I 
 don't think anyone would stand against it. 

Apparently you don't remember the discussion.  The fundamental objection
to it was that it would create a never-ending source of bugs, ie, using
the logical column number where the physical number was required or vice
versa.  Even assuming that we could eliminate all such bugs in the code
base at any instant, what would prevent introduction of another such bug
in every patch?  Most ordinary test cases would fail to expose the
difference.

If you can show me a reasonably bulletproof or machine-checkable way to
keep the two kinds of column numbers distinct, I'd be all for it.  But
without that, the answer will remain no.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-19 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 If you can show me a reasonably bulletproof or machine-checkable way to
 keep the two kinds of column numbers distinct, I'd be all for it.  But
 without that, the answer will remain no.

Force references to go through macros which implement the lookup for the
appropriate type?  ie: LOGICAL_COL(table_oid,2) vs.
PHYSICAL_COL(table_oid,1)  Perhaps that's too simplistic.  I guess my
feeling on how this would be approached would be that there'd simply be
a level where logical columns are used and a seperate level where
physical columns are used.  Perhaps the storage layer isn't well enough
abstracted for that though.  Another possibility would be to declare
seperate structures for them (or do something else along those lines,
aka, whatever it is the Linux kernel does) and get the compiler to whine
whenever the typing isn't followed correctly.

Just tossing some thoughts out there, I'd *really* like to have
movable-columns and the ability to add columns in where they're most
appropriate instead of off on the end...  If we can settle on an
approach to deal with Tom's concern I'd be willing to look at updating
the patch to implement it though it's not really high enough that I can
promise anything.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: column ordering, was Re: [HACKERS] [PATCHES] Enums patch v2

2006-12-19 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 Force references to go through macros which implement the lookup for the
 appropriate type?  ie: LOGICAL_COL(table_oid,2) vs.
 PHYSICAL_COL(table_oid,1)  Perhaps that's too simplistic.

It doesn't really address the question of how you know which one to
use at any particular line of code; or even more to the point, what
mechanism will warn you if you use the wrong one.

My gut feeling about this is that we could probably enforce such a
distinction if we were using C++, but while coding in C I have no
confidence in it.  (And no, that's not a vote to move to C++ ...)

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate