Re: [HACKERS] Tuple data

2000-12-18 Thread Tom Lane

Hiroshi Inoue [EMAIL PROTECTED] writes:
 To make it usable for inherited tables requires truly
 horrendous kluges (as you well know). 

 Logical/physical attribute numbers solves it naturally.

Maybe.  At this point that's a theory without experimental evidence
to back it up ;-).  I'm still concerned about how widespread/intrusive
the changes will need to be.

regards, tom lane



Re: [HACKERS] Tuple data

2000-12-18 Thread Emmanuel Charpentier,,,

Hiroshi Inoue wrote :

[ ... ]

 Column order isn't essential in rdbms.

Nitpicking

A relation (a table) is a subset of the Cartesain cross-product of the 
definition domains of the attributes (columns). Cartesian product being 
a commutative operation, "order of columns" does not really exists. Period.

If you impose an order relationship, you *add* inforation to the 
structure. That may be OK, but you can't rely on relational algebra to 
guarantee your results. You'll have to manage it yourself. (And, yes, 
there is relevant algebra for this, too ...).

/Nitpicking

 Isn't it well known that it's not preferable to use
 'select *','insert' without column list etc.. in production
 applications ?

100% agreed. Such a notation is an abbreviation. Handy, but dangerous. 
IMHO, such checking can (should ?) be done by an algorithm checking for 
column *names* before sending the "insert" command.

A partial workaround : inserting in a view containing only the relevant 
columns, in a suitable (and known) order.

[ Back to lurking ... ]




Re: [HACKERS] Tuple data

2000-12-17 Thread Hannu Krosing

Tom Lane wrote:
 
 ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
 critically dependent on heap_getattr returning NULL when an attribute
 beyond the number of attributes actually present in a tuple is accessed.
 That's a fragile and unclean implementation IMHO --- see past traffic
 on this list.

Short of redesigning the whole storage format I can see no better way to
allow 
ALTER ADD COLUMN in any reasonable time. And I cna see no place where
this is 
more "fragile and unclean implementation" than any other in postgres -- 
OTOH it is quite hard for me to "see the past traffic on this list"  as
my 
"PgSQL HACKERS" mail folder is too big for anything else then grep ;)

The notion that anything not stored is NULL seems so natural to me that
it 
is very hard to find any substantial flaw or fragility with it.

--
Hannu



Re: [HACKERS] Tuple data

2000-12-17 Thread Hiroshi Inoue
Hannu Krosing wrote:
 
 Tom Lane wrote:
 
  ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
  critically dependent on heap_getattr returning NULL when an attribute
  beyond the number of attributes actually present in a tuple is accessed.
  That's a fragile and unclean implementation IMHO --- see past traffic
  on this list.
 
 Short of redesigning the whole storage format I can see no better way to
 allow
 ALTER ADD COLUMN in any reasonable time. And I cna see no place where
 this is
 more "fragile and unclean implementation" than any other in postgres --
 OTOH it is quite hard for me to "see the past traffic on this list"  as
 my
 "PgSQL HACKERS" mail folder is too big for anything else then grep ;)


I don't remember the traffic either.
IIRC,I objected to Tom at this point in pgsql-bugs recently.
I think it's very important for dbms that ALTER ADD COLUMN
touches tuples as less as possible.

Regards.
Hiroshi Inoue


Re: [HACKERS] Tuple data

2000-12-17 Thread Tom Lane

Hiroshi Inoue [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
 critically dependent on heap_getattr returning NULL when an attribute
 beyond the number of attributes actually present in a tuple is accessed.
 That's a fragile and unclean implementation IMHO --- see past traffic
 on this list.

 I don't remember the traffic either.
 IIRC,I objected to Tom at this point in pgsql-bugs recently.

That was the traffic I was recalling ;-)

 I think it's very important for dbms that ALTER ADD COLUMN
 touches tuples as less as possible.

I disagree.  The existing ADD COLUMN implementation only works for
appending columns at the end of tuples; it can't handle inserting
a column.  To make it usable for inherited tables requires truly
horrendous kluges (as you well know).  IMHO we'd be far better off
to rewrite ADD COLUMN so that it does go through and change all the
tuples, and then we could get rid of the hackery that tries --- not
very successfully --- to deal with inconsistent column orders between
parent and child tables.

I have a similar opinion about DROP COLUMN ...

regards, tom lane



Re: [HACKERS] Tuple data

2000-12-17 Thread Michael Richards

Considering how often you actually change the structure of a database, I
don't mind waiting for such a reorganisation to take place, however it would
still be nice if it could be done in O(1) time because it would minimise the
amount of downtime required for structure changes.

What are the cases where the current implementation does not handle it
properly?

Restructuring all the tables (inherited too) would require either 2x the
space or lots of hackery to take care of situations where there isn't enough
room for a larger null bitmap. This hackery seems more complicated than just
having alter look for inherited tables and add the column to those as well.

You could define a flag or something so a deleted column could be so flagged
and
ALTER TABLE DELETE COLUMN
would run just as fast. Vacuum could then take care of cleaning out these
columns. If you wanted to make it really exciting, how about searching for a
deleted column for the ADD column. Touch all the tuples by zeroing that
column and finally update pg_attribute. Nothing would be more fun than 2 way
fragmentation :)

-Michael

- Original Message -
From: "Tom Lane" [EMAIL PROTECTED]
To: "Hiroshi Inoue" [EMAIL PROTECTED]
Cc: "Hannu Krosing" [EMAIL PROTECTED]; "Michael Richards"
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Sunday, December 17, 2000 8:05 PM
Subject: Re: [HACKERS] Tuple data


 Hiroshi Inoue [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  ALTER ADD COLUMN doesn't touch any tuples, and you're right that
it's
  critically dependent on heap_getattr returning NULL when an attribute
  beyond the number of attributes actually present in a tuple is
accessed.
  That's a fragile and unclean implementation IMHO --- see past traffic
  on this list.

  I don't remember the traffic either.
  IIRC,I objected to Tom at this point in pgsql-bugs recently.

 That was the traffic I was recalling ;-)

  I think it's very important for dbms that ALTER ADD COLUMN
  touches tuples as less as possible.

 I disagree.  The existing ADD COLUMN implementation only works for
 appending columns at the end of tuples; it can't handle inserting
 a column.  To make it usable for inherited tables requires truly
 horrendous kluges (as you well know).  IMHO we'd be far better off
 to rewrite ADD COLUMN so that it does go through and change all the
 tuples, and then we could get rid of the hackery that tries --- not
 very successfully --- to deal with inconsistent column orders between
 parent and child tables.

 I have a similar opinion about DROP COLUMN ...

 regards, tom lane




Re: [HACKERS] Tuple data

2000-12-17 Thread Tom Lane

"Michael Richards" [EMAIL PROTECTED] writes:
 What are the cases where the current implementation does not handle it
 properly?

Inheritance.

CREATE TABLE parent (a, b, c);

CREATE TABLE child (z) INHERITS (parent);

ALTER TABLE parent ADD COLUMN (d);

With the current implementation you now have column order a,b,c,d in the
parent, and a,b,c,z,d in the child.  This is seriously broken for a
number of reasons, not least being that pg_dump can't realistically be
expected to reproduce that state.

I don't really buy the complaint about "it'll take 2x the space".  So
what?  You'll likely expend that anyway trying to load reasonable data
into the new column.  If we implemented ADD COLUMN in a less klugy
fashion, we could at least support loading a DEFAULT value into the
column (not to mention allowing it to be NOT NULL).  More to the point,
I don't think that using 2x space is a sufficient justification for the
complexity and fragility that are imposed *throughout* the system in
order to make ADD COLUMN's life easy.  You pay those hidden costs every
day you use Postgres, even if you've never done an ADD COLUMN in your
life.

 You could define a flag or something so a deleted column could be so flagged
 and ALTER TABLE DELETE COLUMN would run just as fast.

Hiroshi already tried that; you can find the vestiges of his attempt in
current sources (look for _DROP_COLUMN_HACK__).  Again, the cost to the
rest of the system strikes me as far more than I care to pay.

In the end it's a judgment call --- my judgment is that making these
features fast is not worth the implementation effort and
understandability/reliability penalties that ensue.  I think we would
be better off spending our effort on other things.

regards, tom lane



Re: [HACKERS] Tuple data

2000-12-17 Thread Hiroshi Inoue
Tom Lane wrote:
 
 Hiroshi Inoue [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
  critically dependent on heap_getattr returning NULL when an attribute
  beyond the number of attributes actually present in a tuple is accessed.
  That's a fragile and unclean implementation IMHO --- see past traffic
  on this list.
 
  I don't remember the traffic either.
  IIRC,I objected to Tom at this point in pgsql-bugs recently.
 
 That was the traffic I was recalling ;-)
 
  I think it's very important for dbms that ALTER ADD COLUMN
  touches tuples as less as possible.
 
 I disagree.  The existing ADD COLUMN implementation only works for
 appending columns at the end of tuples; it can't handle inserting
 a column.

Column order isn't essential in rdbms.
Isn't it well known that it's not preferable to use
'select *','insert' without column list etc.. in production
applications ?

 To make it usable for inherited tables requires truly
 horrendous kluges (as you well know). 

Logical/physical attribute numbers solves it naturally.

 IMHO we'd be far better off
 to rewrite ADD COLUMN so that it does go through and change all the
 tuples, and then we could get rid of the hackery that tries --- not
 very successfully --- to deal with inconsistent column orders between
 parent and child tables.
 

We couldn't live without ALTER ADD COLUMN and it's
very critical for me to be able to ADD COLUMN even
when the target table is at full work.
It has been one of my criteria how cool the dbms is.
Fortunately PostgreSQL has been cool but 

Regards.
Hiroshi Inoue


Re: [HACKERS] Tuple data

2000-12-17 Thread Michael Richards

This is what I assumed the problem to be but I wasn't sure if there would be
more to it or not.

My question now is: Should the order in which columns are physically stored
matter?

Since the details of where to find the columns in the tuple data are stored
in pg_attribute, I'd think this is a place where the storage layer should be
free to store it as it likes. Consider as a performance enhancement
shuffling all the variable length columns to the end of the table. This
would save having to look at the size of all the variable length columns in
order to  examine a fixed length column.

Obviously since I only have a brief understanding of how stuff works I'm
relying on you to point out whether this is even a valid suggestion.

-Michael

 Inheritance.

 CREATE TABLE parent (a, b, c);

 CREATE TABLE child (z) INHERITS (parent);

 ALTER TABLE parent ADD COLUMN (d);

 With the current implementation you now have column order a,b,c,d in the
 parent, and a,b,c,z,d in the child.  This is seriously broken for a
 number of reasons, not least being that pg_dump can't realistically be
 expected to reproduce that state.





[HACKERS] Tuple data

2000-12-16 Thread Michael Richards

Hi.

I've still got something I can't seem to get. In my test cases with simple
tables the first uint16 of tuple data after the header contained the length
of the tuple. In this case I can't seem to figure out what the value F24D
stands for when I'd expect it's length to be 0800.

The first tuple in my table has:
OID: 6155665
t_cmin: 32494973
t_cmax: 0
t_xmin: 32494324
t_xmax: 32495742
t_ctid: 55181312:82
t_infomask: A503
Bitmap: 3FF2
Attributes: 7
Data Offset: 36

The flags for this tuple say:
HEAP_MOVED_IN
HEAP_UPDATED
HEAP_XMAX_COMMITTED
HEAP_XMIN_INVALID
HEAP_HASVARLENA
HEAP_HASNULL


Tuple Data:
F24D    1300  4E65 7720 4D61 696C 2046 6F6C 6465 7200
9F00  9F00  48A2 1800

The schema is:
  Attribute  |Type |Modifier
-+-+

 userid  | integer | not null
 folderid| integer | not null default
nextval('folders_folderid_seq'::text)
 foldername  | varchar(25) |
 messages| integer |
 newmessages | integer |
 foldersize  | integer |
 popinfo | integer |
Indices: folder_folderid_idx,
 folders_pkey

thanks

-Michael




Re: [HACKERS] Tuple data

2000-12-16 Thread Tom Lane

"Michael Richards" [EMAIL PROTECTED] writes:
 I've still got something I can't seem to get. In my test cases with simple
 tables the first uint16 of tuple data after the header contained the length
 of the tuple.

That's not right --- AFAIR there is no length in the tuple data.  You
must use the length from the 'page item' pointer that points to this
tuple if you want to know the total tuple length.

If you were testing with tables containing single varlena columns, then
you may have seen the varlena datum's length word and taken it for total
length of the tuple --- but it's only total length of that one column.

Your example dump looks like F24D  is userid,   is folderid,
and 1300  is the varlena length word for foldername.

regards, tom lane



Re: [HACKERS] Tuple data

2000-12-16 Thread Hannu Krosing

Michael Richards wrote:
 
 Hi.
 
 I've still got something I can't seem to get. In my test cases with simple
 tables the first uint16 of tuple data after the header contained the length
 of the tuple. In this case I can't seem to figure out what the value F24D
 stands for when I'd expect it's length to be 0800.

I'm not sure, but you may see some part of the NULL bitmap. 
IIRC it started at a quite illogical place, is suspect it was at byte 31
but 
it still reserved 4bytes for each 32 fields after byte 32

 The first tuple in my table has:
...
 Bitmap: 3F00  00F2
 Attributes: 7

you should have only 4 bytes of bitmap for 7 real attributes

 Data Offset: 36

thats' right  32+4

--
Hannu



Re: [HACKERS] Tuple data

2000-12-16 Thread Hannu Krosing

Michael Richards wrote:
 
  That's not right --- AFAIR there is no length in the tuple data.  You
  must use the length from the 'page item' pointer that points to this
  tuple if you want to know the total tuple length.
 
 Oops, I meant attribute length...
 
  If you were testing with tables containing single varlena columns, then
  you may have seen the varlena datum's length word and taken it for total
  length of the tuple --- but it's only total length of that one column.
 
 Yes, I obviously had assumed that this length was common to all types (I was
 testing with varchars before).
 
 I presume then that I get the sizes based on some system tables. What query
 should I run to give me the layout (in the order it's on disk) and the size
 of each non-varlen attribute?

select * from pg_attribute
 where attrelid = (select oid from pg_class where relname = 'tablename')
 order by attnum;

then look up types by attypid to find the types or just look at attlen
==-1 for varlena types

select * from pg_type where oid = 23; -- gives info for int type
select * from pg_type where oid = 1043;   -- varchar




Hannu



Re: [HACKERS] Tuple data

2000-12-16 Thread Michael Richards

 I'm not sure, but you may see some part of the NULL bitmap.
 IIRC it started at a quite illogical place, is suspect it was at byte 31
 but
 it still reserved 4bytes for each 32 fields after byte 32
Sometimes the t_hoff value in the tuple header is 32 which seems to indicate
no NULL bitmap. This really makes me wonder what happens when you ALTER
TABLE ADD COLUMN on a table since it doesn't seem to take more than O(1)
time. Perhaps it is assumed if the attribute count is less than the actual
number of attributes then the last ones are NULL and no NULL map is
required.

  The first tuple in my table has:
 ...
  Bitmap: 3F00  00F2
  Attributes: 7

 you should have only 4 bytes of bitmap for 7 real attributes

Yes you are correct, my error. To find the bitmap length I was doing:
for (int i=0;iheader-t_hoff-30;i++)
Where if I were able to count it should have been:
for (int i=0;iheader-t_hoff-32;i++)

-Michael




Re: [HACKERS] Tuple data

2000-12-16 Thread Tom Lane

pg_attribute tells you the types and ordering of the attributes
(columns) of a table.  Then see pg_type for the size and alignment
of each type.

regards, tom lane



Re: [HACKERS] Tuple data

2000-12-16 Thread Tom Lane

"Michael Richards" [EMAIL PROTECTED] writes:
 Sometimes the t_hoff value in the tuple header is 32 which seems to indicate
 no NULL bitmap.

There's no null bitmap unless the HASNULLS infomask bit is set.

 This really makes me wonder what happens when you ALTER
 TABLE ADD COLUMN on a table since it doesn't seem to take more than O(1)
 time. Perhaps it is assumed if the attribute count is less than the actual
 number of attributes then the last ones are NULL and no NULL map is
 required.

ALTER ADD COLUMN doesn't touch any tuples, and you're right that it's
critically dependent on heap_getattr returning NULL when an attribute
beyond the number of attributes actually present in a tuple is accessed.
That's a fragile and unclean implementation IMHO --- see past traffic
on this list.

regards, tom lane