subject:"\[HACKERS\] Column storage positions"

Re: [HACKERS] Column storage positions

2008-04-30 Thread Guillaume Smet

On Mon, Apr 2, 2007 at 2:44 AM, Phil Currier <[EMAIL PROTECTED]> wrote:
> On 4/1/07, Guillaume Smet <[EMAIL PROTECTED]> wrote:
>
> > Phil, did you make any progress with your patch?  Your results seemed
> > very encouraging and your implementation interesting.
> > IIRC, the problem was that you weren't interested in working on the
> > "visual/mysqlish" column ordering. As the plan was to decouple column
> > ordering in three different orderings, I don't think it's really a
> > problem if your implementation doesn't support one of them (at least
> > if it doesn't prevent us from having the visual one someday).
> >
>
>  I haven't done much with it since February, largely because my
>  available free time evaporated.  But I do intend to get back to it
>  when I have a chance.  But you're right, the storage position stuff
>  I've worked on is completely independent from display positions, and
>  certainly wouldn't prevent that being added separately.
>
>
>
> > Is there any chance you keep us posted with your progress and post a
> > preliminary patch exposing your design choices? This could allow other
> > people to see if there are interesting results with their particular
> > database and workload.
> >
>
>  Yeah, I'll try to clean things up and post a patch eventually.  And if
>  anyone feels like working on the display position piece, let me know;
>  perhaps we could pool our efforts for 8.4.

Hi Phil,

Did you make any progress on this cleanup? It seems like a good timing
to revive this project if we want it for 8.4.

Thanks for your feedback.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Column storage positions

2007-04-01 Thread Andrew Dunstan


Phil Currier wrote:

I haven't done much with it since February, largely because my
available free time evaporated.  But I do intend to get back to it
when I have a chance.  But you're right, the storage position stuff
I've worked on is completely independent from display positions, and
certainly wouldn't prevent that being added separately.


I agree with this comment from Tom last time it was discussed:


In any case I think it's foolish not to tackle both issues at once.
We know we'd like to have both features and we know that all the same
bits of code need to be looked at to implement either.


Just tackling the side of the problem that interests you is probably not 
the ideal way to go.



cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-04-01 Thread Phil Currier


On 4/1/07, Guillaume Smet <[EMAIL PROTECTED]> wrote:

Phil, did you make any progress with your patch?  Your results seemed
very encouraging and your implementation interesting.
IIRC, the problem was that you weren't interested in working on the
"visual/mysqlish" column ordering. As the plan was to decouple column
ordering in three different orderings, I don't think it's really a
problem if your implementation doesn't support one of them (at least
if it doesn't prevent us from having the visual one someday).


I haven't done much with it since February, largely because my
available free time evaporated.  But I do intend to get back to it
when I have a chance.  But you're right, the storage position stuff
I've worked on is completely independent from display positions, and
certainly wouldn't prevent that being added separately.


Is there any chance you keep us posted with your progress and post a
preliminary patch exposing your design choices? This could allow other
people to see if there are interesting results with their particular
database and workload.


Yeah, I'll try to clean things up and post a patch eventually.  And if
anyone feels like working on the display position piece, let me know;
perhaps we could pool our efforts for 8.4.

phil

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-04-01 Thread Guillaume Smet


On 2/20/07, Phil Currier <[EMAIL PROTECTED]> wrote:

Inspired by this thread [1], and in particular by the idea of storing
three numbers (permanent ID, on-disk storage position, display
position) for each column, I spent a little time messing around with a
prototype implementation of column storage positions to see what kind
of difference it would make.


Phil, did you make any progress with your patch?  Your results seemed
very encouraging and your implementation interesting.
IIRC, the problem was that you weren't interested in working on the
"visual/mysqlish" column ordering. As the plan was to decouple column
ordering in three different orderings, I don't think it's really a
problem if your implementation doesn't support one of them (at least
if it doesn't prevent us from having the visual one someday).

Is there any chance you keep us posted with your progress and post a
preliminary patch exposing your design choices? This could allow other
people to see if there are interesting results with their particular
database and workload.

It's too late for 8.3 but it could be a nice thing to have in 8.4.

Thanks in advance.

Regards.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Column storage positions

2007-02-22 Thread Zeugswetter Andreas ADI SD


> >> I agree, I haven't thought of drop column :-( Drop column should
have 
> >> relabeled attnum. Since it was not done then, my comments are 
> >> probably moot.
> >
> > We can correct this problem now.
> 
> How?  If attnum is serving as both physical position and 
> logical order, how can you make it be logical position 
> without breaking physical position?

If you ask me, attnum would be the logical position and would be used
in all other system tables. attphypos would only be used in
pg_attribute.
It would be quite some work to rearrange attnum in all system tables for
"drop column" and "add column before", but it would be nice for jdbc.

But it seems others want this: attnum beeing an arbitrary number,
that is used in all system tables and 2 extra columns in pg_attribute,
one for logical position and one for physical position.
If you want a corresponding colname to a pg_index attnum you need a map.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-22 Thread Zeugswetter Andreas ADI SD


> > I agree, I haven't thought of drop column :-( Drop column should
have 
> > relabeled attnum.
> > Since it was not done then, my comments are probably moot. 
> 
> We can correct this problem now.

Do you mean fix it with the 3rd column in pg_attribute and use that,
or fix attnum ? :-)

Imho it is a pain to need 2 numbers and a mapping in drivers.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-22 Thread Alvaro Herrera

Kris Jurka escribió:
> 
> 
> On Thu, 22 Feb 2007, Alvaro Herrera wrote:
> 
> >Zeugswetter Andreas ADI SD escribió:
> >>
> >>I agree, I haven't thought of drop column :-( Drop column should have 
> >>relabeled attnum. Since it was not done then, my comments are probably 
> >>moot.
> >
> >We can correct this problem now.
> 
> How?  If attnum is serving as both physical position and logical order, 
> how can you make it be logical position without breaking physical 
> position?

By patching the code, of course, so that it doesn't serves as both
things, which is what is being discussed.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-22 Thread Kris Jurka




On Thu, 22 Feb 2007, Alvaro Herrera wrote:


Zeugswetter Andreas ADI SD escribió:


I agree, I haven't thought of drop column :-( Drop column should have 
relabeled attnum. Since it was not done then, my comments are probably 
moot.


We can correct this problem now.



How?  If attnum is serving as both physical position and logical order, 
how can you make it be logical position without breaking physical 
position?


Kris Jurka

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-22 Thread Alvaro Herrera

Zeugswetter Andreas ADI SD escribió:
> 
> > > And I also see a lot of unhappiness from users of system tables
> > > when column numbers all over the system tables would not be
> > > logical column positions any more.
> > 
> > Right now the fact that attnum presents the logical order but 
> > not the logical position is a problem for the JDBC driver.  
> > In the presence of dropped columns there is no easy way to 
> > get from a pg_attribute entry to logical position.  I would 
> > hope that a new logical position column would reflect the 
> > actual position and solve this problem.
> 
> I agree, I haven't thought of drop column :-( Drop column should have
> relabeled attnum.
> Since it was not done then, my comments are probably moot. 

We can correct this problem now.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-22 Thread Zeugswetter Andreas ADI SD


> > And I also see a lot of unhappiness from users of system tables when

> > column numbers all over the system tables would not be logical
column 
> > positions any more.
> 
> Right now the fact that attnum presents the logical order but 
> not the logical position is a problem for the JDBC driver.  
> In the presence of dropped columns there is no easy way to 
> get from a pg_attribute entry to logical position.  I would 
> hope that a new logical position column would reflect the 
> actual position and solve this problem.

I agree, I haven't thought of drop column :-( Drop column should have
relabeled attnum.
Since it was not done then, my comments are probably moot. 

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-22 Thread Kris Jurka




On Thu, 22 Feb 2007, Zeugswetter Andreas ADI SD wrote:

And I also see a lot of unhappiness from users of system tables when 
column numbers all over the system tables would not be logical column 
positions any more.




Right now the fact that attnum presents the logical order but not the 
logical position is a problem for the JDBC driver.  In the presence of 
dropped columns there is no easy way to get from a pg_attribute entry to 
logical position.  I would hope that a new logical position column would 
reflect the actual position and solve this problem.


Kris Jurka

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Column storage positions

2007-02-22 Thread Zeugswetter Andreas ADI SD


> > Yes, that was the idea (not oid but some number), and I am arguing 
> > against it. Imho people are used to see the logical position in e.g.
> > pg_index
> >
> 
> Which people are you talking about? In my commercial PG work 
> I hardly ever look at a system table at all, and users 
> shouldn't have to IMNSHO. 

You are probably right. I tend to resort to commandline tools, schema
dumps and system tables, probably not many other "people" do that. I
often don't get to use my perferred toolset because it is not installed.


> If you mean tools developers, then accomodating to catalog 
> changes is par for the course, I should think.

The question is, whether the distributed work needed to get all the
tools and interfaces (like jdbc, odbc, pgadmin) to work again isn't more
work, than doing it in the backend would be.

Since we want plan invalidation anyway, I am not sure the rest is so
much.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Column storage positions

2007-02-22 Thread Andrew Dunstan


Zeugswetter Andreas ADI SD wrote:


Yes, that was the idea (not oid but some number), and I am arguing
against it. Imho people are used to see the logical position in e.g.
pg_index


  


Which people are you talking about? In my commercial PG work I hardly 
ever look at a system table at all, and users shouldn't have to IMNSHO. 
If you mean tools developers, then accomodating to catalog changes is 
par for the course, I should think.


cheers

andrew


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-22 Thread Zeugswetter Andreas ADI SD


> > And I also see a lot of unhappiness from users of system tables when

> > column numbers all over the system tables would not be logical
column 
> > positions any more.
> 
> Are you arguing against the feature? Or against the suggested design?

Against the design.

> I should have thought (without much looking) one possible way 
> to implement it would be to put Oids on pg_attribute for the 
> permanent id, and keep attnum for the (now mutable) logical 
> order, adding a further column for the physical order.

Yes, that was the idea (not oid but some number), and I am arguing
against it. Imho people are used to see the logical position in e.g.
pg_index

I know it is a lot of work to update all those dependencies in the
system tables to reorder logical position, but that is the path I think
should be taken. And the first step in that direction is Phil's patch.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-22 Thread Andrew Dunstan


Zeugswetter Andreas ADI SD wrote:
And I also see a lot of unhappiness from users of system tables 
when column numbers all over the system tables would not be logical
column 
positions any more.


  


Are you arguing against the feature? Or against the suggested design?

I should have thought (without much looking) one possible way to 
implement it would be to put Oids on pg_attribute for the permanent id, 
and keep attnum for the (now mutable) logical order, adding a further 
column for the physical order.



cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-22 Thread Zeugswetter Andreas ADI SD


> > In any case I think it's foolish not to tackle both issues at once.
> > We know we'd like to have both features and we know that 
> all the same 
> > bits of code need to be looked at to implement either.
> 
> I guess I disagree with that sentiment.  I don't think it's 
> necessary to bundle these two features together, even if some 
> analysis will be duplicated between them, since they are 
> completely distinct in a functional sense and will touch 
> different places in the code.

I fully agree with Phil here. 

And I also see a lot of unhappiness from users of system tables 
when column numbers all over the system tables would not be logical
column 
positions any more.

Andreas


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Column storage positions

2007-02-22 Thread Robert Treat

On Thursday 22 February 2007 09:06, Phil Currier wrote:
> On 2/22/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> > Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > > Alvaro Herrera wrote:
> > >> Right, I'm not advocating not doing that -- I'm just saying that the
> > >> first step to that could be decoupling physical position with attr id
> > >>
> > >> :-) Logical column ordering (the order in which SELECT * expands to)
> > >>
> > >> seems to me to be a different feature.
> > >
> > > Except in the sense that divorcing the id from the storage order makes
> > > it possible to do sanely. :-)
> >
> > They are different features, but they are going to hit all the same
> > code, because the hardest part of this remains making sure that every
> > piece of the code is using the right kind of column number.  The
> > suggestion I posted awhile ago amounts to saying that we might be able
> > to solve that by default, by making sure that only one definition of
> > "column number" is relevant to the majority of the backend and we can
> > figure out exactly where the other definitions need to apply.  But
> > that's handwaving until someone actually does it :-(
>
> I don't really think it's just handwaving at this point because I've
> done a lot of it :).  I'm not saying the work is done, or that a lot
> more testing isn't required, but at the moment I have a working system
> that seems to do what it needs to do to separate storage position from
> permanent ID/display position.  And the changes to accomplish this
> were quite localized - namely the tuple access routines in
> heaptuple.c, and the small handful of places that need to construct
> tuple descriptors.  That's pretty much it - the rest of the codebase
> is remains untouched.
>
> > In any case I think it's foolish not to tackle both issues at once.
> > We know we'd like to have both features and we know that all the same
> > bits of code need to be looked at to implement either.
>
> I guess I disagree with that sentiment.  I don't think it's necessary
> to bundle these two features together, even if some analysis will be
> duplicated between them, since they are completely distinct in a
> functional sense and will touch different places in the code.
> Smaller, more incremental changes make more sense to me.
>

Can you post a patch of what you have now to -patches? 

> But if both-features-at-once is what the community wants, that's fine,
> no worries.  I'll just pull my own personal hat out of the ring until
> someone comes along who's interested in implementing them both at the
> same time.
>

Are you that opposed to working on the display portions as well?  You'll be a 
hero to thousands of mysql users if you do it. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-22 Thread Phil Currier

On 2/22/07, Tom Lane <[EMAIL PROTECTED]> wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Alvaro Herrera wrote:
>> Right, I'm not advocating not doing that -- I'm just saying that the
>> first step to that could be decoupling physical position with attr id
>> :-) Logical column ordering (the order in which SELECT * expands to)
>> seems to me to be a different feature.

> Except in the sense that divorcing the id from the storage order makes
> it possible to do sanely. :-)

They are different features, but they are going to hit all the same
code, because the hardest part of this remains making sure that every
piece of the code is using the right kind of column number.  The
suggestion I posted awhile ago amounts to saying that we might be able
to solve that by default, by making sure that only one definition of
"column number" is relevant to the majority of the backend and we can
figure out exactly where the other definitions need to apply.  But
that's handwaving until someone actually does it :-(

I don't really think it's just handwaving at this point because I've
done a lot of it :).  I'm not saying the work is done, or that a lot
more testing isn't required, but at the moment I have a working system
that seems to do what it needs to do to separate storage position from
permanent ID/display position.  And the changes to accomplish this
were quite localized - namely the tuple access routines in
heaptuple.c, and the small handful of places that need to construct
tuple descriptors.  That's pretty much it - the rest of the codebase
is remains untouched.

In any case I think it's foolish not to tackle both issues at once.
We know we'd like to have both features and we know that all the same
bits of code need to be looked at to implement either.

I guess I disagree with that sentiment.  I don't think it's necessary
to bundle these two features together, even if some analysis will be
duplicated between them, since they are completely distinct in a
functional sense and will touch different places in the code.
Smaller, more incremental changes make more sense to me.

But if both-features-at-once is what the community wants, that's fine,
no worries.  I'll just pull my own personal hat out of the ring until
someone comes along who's interested in implementing them both at the
same time.

phil

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-22 Thread Simon Riggs

On Wed, 2007-02-21 at 16:57 -0300, Alvaro Herrera wrote:
> Andrew Dunstan escribió:
> > Simon Riggs wrote:
> > >
> > >I agree with comments here about the multiple orderings being a horrible
> > >source of bugs, as well as lots of coding even to make it happen at all
> > >http://archives.postgresql.org/pgsql-hackers/2006-12/msg00859.php
> > 
> > I thought we were going with this later proposal of Tom's (on which he's 
> > convinced me): 
> > http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php - if 
> > not I'm totally confused (situation normal). The current thread started 
> > with this sentence:
> > 
> > >Inspired by this thread [1], and in particular by the idea of storing
> > >three numbers (permanent ID, on-disk storage position, display
> > >position) for each column, I spent a little time messing around with a
> > >prototype implementation of column storage positions to see what kind
> > >of difference it would make.
> > 
> > I haven't understood Alvaro to suggest not keeping 3 numbers.
> 
> Right, I'm not advocating not doing that -- I'm just saying that the
> first step to that could be decoupling physical position with attr id
> :-) Logical column ordering (the order in which SELECT * expands to)
> seems to me to be a different feature.

Not disagreed. :-)

Something very, very simple seems most likely to be an effective
additional feature for 8.3. We can implement the 2/3 position version
for 8.4

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Column storage positions

2007-02-21 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Alvaro Herrera wrote:
>> Right, I'm not advocating not doing that -- I'm just saying that the
>> first step to that could be decoupling physical position with attr id
>> :-) Logical column ordering (the order in which SELECT * expands to)
>> seems to me to be a different feature.

> Except in the sense that divorcing the id from the storage order makes 
> it possible to do sanely. :-)

They are different features, but they are going to hit all the same
code, because the hardest part of this remains making sure that every
piece of the code is using the right kind of column number.  The
suggestion I posted awhile ago amounts to saying that we might be able
to solve that by default, by making sure that only one definition of
"column number" is relevant to the majority of the backend and we can
figure out exactly where the other definitions need to apply.  But
that's handwaving until someone actually does it :-(

In any case I think it's foolish not to tackle both issues at once.
We know we'd like to have both features and we know that all the same
bits of code need to be looked at to implement either.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan


elein wrote:


The storage order is orthogonal to the display order.  display order can be 
handled
in attnum and the new storage order can be the new column.


  


If you review the earlier discussion you will see that it is proposed 
(by Tom) to have 3 numbers (i.e. 2 new cols): an immutable id and 
mutable storage/physical order and display/logical order.


cheers

andrew


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Column storage positions

2007-02-21 Thread elein

On Wed, Feb 21, 2007 at 08:33:10PM +0100, Florian G. Pflug wrote:
> Simon Riggs wrote:
> >On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:
> >>On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> >>>I'd expect the system being able to reoder the columns to the most
> >>>efficient order possible (performance-wise and padding-saving-wise),
> >>>automatically.  When you create a table, sort the columns to the most
> >>>efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> >>>end of the tuple; and anything that requires a rewrite of the table
> >>>(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> >>>do it as well; and do it on TRUNCATE also) again recomputes the most
> >>>efficient order.
> >>That's exactly what I'm proposing.  On table creation, the system
> >>chooses an efficient column order for you. 
> >
> >That's fairly straightforward and beneficial. I much prefer Alvaro's
> >approach rather than the storage position details originally described.
> >Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
> >really don't think you want to go there.
> >
> >There is a problem: If people do a CREATE TABLE and then issue SELECT *
> >they will find the columns in a different order. That could actually
> >break some programs, so it isn't acceptable in all cases. e.g. COPY
> >without a column-list assumes that the incoming data should be assigned
> >to the table columns in the same order as the incoming data file.
> 
> But the display order (and hence the COPY order) of columns would still 
> be determinted by attnum, not by some attstoragepos, no?
> The column reordering would only apply to the physical storage of 
> columns, not to how it's presented to the user I'd think.
> 
> The original idea was to add a third column, attdisplaypos, and let the 
> user choose the display ordering independent from the unique id 
> (attnum), which in turn is independent from the storage position.
> 
> For simplicity, the OP said he omitted the display-position part here,
> because it's really orthogonal to being able to modify the storage position.
> 

IMHO I think display order is very important to users.  First, don't
break the select *, no matter how bad it is to code that. Next, don't
break copy or pg_dump/restore.  We've fielded a lot of questions on
the ordering of columns for display and simplicity reasons.

The storage order is orthogonal to the display order.  display order can be 
handled
in attnum and the new storage order can be the new column.

--elein

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Gregory Stark

"Andrew Dunstan" <[EMAIL PROTECTED]> writes:

> Gregory Stark wrote:
>> "Andrew Dunstan" <[EMAIL PROTECTED]> writes:
>>
>>   
>>> I would want to see this very carefully instrumented. Assuming we are 
>>> putting
>>> all fixed size objects at the front, which seems like the best arrangement,
>>> then the position of every fixed field and the fixed portion of the 
>>> position of
>>> every varlena field can be precalculated (and in the case of the leftmost
>>> varlena field that's it's complete position). 
>>
>> I'm not sure what you mean by "the fixed portion of the position of every
>> varlena field". Fields are just stuck one after the other (plus alignment)
>> skipping nulls. So any field after a null or varlena field can't have its
>> position cached at all.
>
> I'd forgotten about nulls :-( . Nevertheless, it's hard to think of a case
> where the penalty for shifting fixed size fields to the front is going to be
> very big. If we really wanted to optimise for speed for some varlena case, 
> we'd
> probably need to keep stats on usage patterns, but that seems like massive
> overkill.

Oh, certainly, especially since only one varlena could ever be cached and soon
even that one won't be unless it's the very first column in the table. So
really, not worth thinking about.

Well the statistics we have do include the percentage of nulls in each column,
so we can sort columns by "fixed width not null" first, then "fixed width
nullable" by decreasing probability of being null, then varlenas.

But there's a tradeoff here. The more we try to optimize for cacheable offsets
the more difficult it will be to pack away the alignments.

Consider something like:

int not null
boolean not null
int null
textnull

If we want we can pack this as int,int,boolean,text and (as long as the text
gets a 1-byte header) have them packed with no alignment.

But then the boolean can't use the cache whenever the int column is null. (the
offset will still be cached but it won't be used unless the int column is
non-null).

Alternatively we can pack this as int,boolean,int,text in which case the
boolean will *always* use the cache but it will be preceded by three wasted
padding bytes.

I tend to think the padding is more important than the caching because in
large systems the i/o speed dominates. But that doesn't mean the cpu cost is
negligible either. Especially on very wide tables.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan




Gregory Stark wrote:

"Andrew Dunstan" <[EMAIL PROTECTED]> writes:

  

I would want to see this very carefully instrumented. Assuming we are putting
all fixed size objects at the front, which seems like the best arrangement,
then the position of every fixed field and the fixed portion of the position of
every varlena field can be precalculated (and in the case of the leftmost
varlena field that's it's complete position). 



I'm not sure what you mean by "the fixed portion of the position of every
varlena field". Fields are just stuck one after the other (plus alignment)
skipping nulls. So any field after a null or varlena field can't have its
position cached at all.
  


I'd forgotten about nulls :-( . Nevertheless, it's hard to think of a 
case where the penalty for shifting fixed size fields to the front is 
going to be very big. If we really wanted to optimise for speed for some 
varlena case, we'd probably need to keep stats on usage patterns, but 
that seems like massive overkill.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Gregory Stark

"Andrew Dunstan" <[EMAIL PROTECTED]> writes:

> I would want to see this very carefully instrumented. Assuming we are putting
> all fixed size objects at the front, which seems like the best arrangement,
> then the position of every fixed field and the fixed portion of the position 
> of
> every varlena field can be precalculated (and in the case of the leftmost
> varlena field that's it's complete position). 

I'm not sure what you mean by "the fixed portion of the position of every
varlena field". Fields are just stuck one after the other (plus alignment)
skipping nulls. So any field after a null or varlena field can't have its
position cached at all.

Sadly one effect of the 1-byte header varlenas is that the position of the
first varlena can't be cached any more. That's because its alignment depends
on whether you're storing a short varlena or a full 4-byte varlena.

Currently there's an exception for the first column of the table since that's
always at offset 0. We could add another exception and cache the first varlena
if it happens to land on an intaligned offset without any extra alignment. I'm
not sure if that pays for itself or not though. It still only helps 25% of the
time and only for the first varlena so it doesn't seem terribly important.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan




Florian G. Pflug wrote:


BTW, this is a good case for why the storage order should - directly or
indirectly - be tweakable. You can either optimize for space, and _then_
for speed - which is what the OP did I think - or first for speed, and 
then for space. If the dba cannot choose the strategy, there will 
always be workloads where the engine does it the wrong way around.





Maybe a simple setting on ordering strategy would be OK. The chance of 
mucking it up if you can directly set the physical order seems just too 
great to me.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Stephan Szabo wrote:

On Wed, 21 Feb 2007, Alvaro Herrera wrote:

Did I miss something in what you were trying to say?  I assume you must
already know this.


I think so. What I was mentioning was that I was pretty sure that there
was a message with someone saying that they actually tried something that
did this and that they found left-most varchar access was slightly slower
after the reordering although general access was faster. I believe the
table case was alternating smallint and varchar columns, but I don't know
what was tested for the retrieval. If that turns out to be able to be
supported by other tests, then for some access patterns, the rearranged
version might be slower.


Here is the original quote:

The results were encouraging: on a table
with 20 columns of alternating smallint and varchar(10) datatypes,
selecting the max() of one of the rightmost int columns across 1
million rows ran around 3 times faster.  The same query on the
leftmost varchar column (which should suffer the most from this
change) predictably got a little slower (about 10%);


What the OP doesn't mention is how the exact layouts looked before
and after the reordering - maybe a nullable field fixed-length field
got moved before the varchar column in question, which would disable
offset caching I guess.

Let's say the reodering algorithm is changed to only move non-nullable
fixed-width columns to the left - can anyone see an access pattern that 
would run slower after the reodering? I certainly can't - because the set

of columns for which offset caching works after the reodering would
be a superset of the one for which it works before the reordering.

BTW, this is a good case for why the storage order should - directly or
indirectly - be tweakable. You can either optimize for space, and _then_
for speed - which is what the OP did I think - or first for speed, and then 
for space. If the dba cannot choose the strategy, there will always be 
workloads where the engine does it the wrong way around.


greetings, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan




Stephan Szabo wrote:

 What I was mentioning was that I was pretty sure that there
was a message with someone saying that they actually tried something that
did this and that they found left-most varchar access was slightly slower
after the reordering although general access was faster. I believe the
table case was alternating smallint and varchar columns, but I don't know
what was tested for the retrieval. If that turns out to be able to be
supported by other tests, then for some access patterns, the rearranged
version might be slower.

  


I would want to see this very carefully instrumented. Assuming we are 
putting all fixed size objects at the front, which seems like the best 
arrangement, then the position of every fixed field and the fixed 
portion of the position of every varlena field can be precalculated (and 
in the case of the leftmost varlena field that's it's complete 
position). So the extra effort in getting to the leftmost varchar field 
should be close to zero if this is done right, ISTM.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Column storage positions

2007-02-21 Thread Stephan Szabo

On Wed, 21 Feb 2007, Alvaro Herrera wrote:

> Stephan Szabo escribi?:
> > On Wed, 21 Feb 2007, Martijn van Oosterhout wrote:
> >
> > > On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
> > > > Well, for two reasons:
> > > >
> > > > 1) If you have a table with one very-frequently-accessed varchar()
> > > > column and several not-frequently-accessed int columns, it might
> > > > actually make sense to put the varchar column first.  The system won't
> > > > always be able to make the most intelligent decision about table
> > > > layout.
> > >
> > > Umm, the point of the exercise is that if you know there are int
> > > columns, then you can skip over them, whereas you can never skip over a
> > > varchar column. So there isn't really any situation where it would be
> > > better to put the varchar first.
> >
> > IIRC, in the first message in this thread, or another recent thread of
> > this type, someone tried a reordering example with alternating
> > smallints and varchar() and found that the leftmost varchar was
> > actually slower to access after reordering, so I'm not sure that we can
> > say there isn't a situation where it would affect things.
>
> Offsets are cached in tuple accesses, but the caching is obviously
> disabled for all attributes past any variable-length attribute.  So if
> you put a varlena attr in front, caching is completely disabled for all
> attrs (but that first one).  The automatic reordering algorithm must put
> all fixed-len attrs at the front, so that their offets (and that of the
> first variable length attr) can be cached.
>
> Did I miss something in what you were trying to say?  I assume you must
> already know this.

I think so. What I was mentioning was that I was pretty sure that there
was a message with someone saying that they actually tried something that
did this and that they found left-most varchar access was slightly slower
after the reordering although general access was faster. I believe the
table case was alternating smallint and varchar columns, but I don't know
what was tested for the retrieval. If that turns out to be able to be
supported by other tests, then for some access patterns, the rearranged
version might be slower.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Phil Currier escribió:
> On 2/21/07, Gregory Stark <[EMAIL PROTECTED]> wrote:
> >So yes, there would be a use case for specifying the physical column layout
> >when pg_migrator is doing the pg_dump/restore. But pg_migrator could 
> >probably
> >just update the physical column numbers itself. It's not like updating 
> >system
> >catalog tables directly is any more of an abstraction violation than 
> >swapping
> >files out from under the database...
> 
> If people are ok with that answer, then I'll gladly stop suggesting
> that ALTER TABLE be able to explicitly set storage positions.  I was
> just trying to avoid forcing a tool like pg_migrator to muck with the
> system catalogs.

I am ... that would be pg_migrator's goal anyway.  And it's certainly
going to need knowledge on how to go from one version to the next.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier


On 2/21/07, Gregory Stark <[EMAIL PROTECTED]> wrote:

So yes, there would be a use case for specifying the physical column layout
when pg_migrator is doing the pg_dump/restore. But pg_migrator could probably
just update the physical column numbers itself. It's not like updating system
catalog tables directly is any more of an abstraction violation than swapping
files out from under the database...


If people are ok with that answer, then I'll gladly stop suggesting
that ALTER TABLE be able to explicitly set storage positions.  I was
just trying to avoid forcing a tool like pg_migrator to muck with the
system catalogs.

phil

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan




Alvaro Herrera wrote:


I haven't understood Alvaro to suggest not keeping 3 numbers.



Right, I'm not advocating not doing that -- I'm just saying that the
first step to that could be decoupling physical position with attr id
:-) Logical column ordering (the order in which SELECT * expands to)
seems to me to be a different feature.

  


Except in the sense that divorcing the id from the storage order makes 
it possible to do sanely. :-)


Incidentally, I'm sure there would be a full scale revolt if there was a 
suggestion to alter the visible behaviour of SELECT *, COPY and other 
commands that rely on the logical ordering (which is currently, and 
unless we provide commands to alter it would stay as, the definition 
order). That's the order pg_dump should use IMNSHO - it should never 
have to worry about the physical order nor about explicitly setting the 
logical order.


cheers

andrew



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Column storage positions

2007-02-21 Thread Gregory Stark

"Florian G. Pflug" <[EMAIL PROTECTED]> writes:

> Why would a pg_migrator style upgrade use pg_dump at all? I assumed it
> would rather copy the verbatim data from the old to the new catalog,
> only changing it if the layout of the tables in pg_catalog actually changed.

The way pg_migrator works is does a pg_dump to move the schema to the new
postgres. Then it transfers the files and drops them into place where the new
schema expects to find them.

So yes, there would be a use case for specifying the physical column layout
when pg_migrator is doing the pg_dump/restore. But pg_migrator could probably
just update the physical column numbers itself. It's not like updating system
catalog tables directly is any more of an abstraction violation than swapping
files out from under the database...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Andrew Dunstan escribió:
> Simon Riggs wrote:
> >
> >I agree with comments here about the multiple orderings being a horrible
> >source of bugs, as well as lots of coding even to make it happen at all
> >http://archives.postgresql.org/pgsql-hackers/2006-12/msg00859.php
> 
> I thought we were going with this later proposal of Tom's (on which he's 
> convinced me): 
> http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php - if 
> not I'm totally confused (situation normal). The current thread started 
> with this sentence:
> 
> >Inspired by this thread [1], and in particular by the idea of storing
> >three numbers (permanent ID, on-disk storage position, display
> >position) for each column, I spent a little time messing around with a
> >prototype implementation of column storage positions to see what kind
> >of difference it would make.
> 
> I haven't understood Alvaro to suggest not keeping 3 numbers.

Right, I'm not advocating not doing that -- I'm just saying that the
first step to that could be decoupling physical position with attr id
:-) Logical column ordering (the order in which SELECT * expands to)
seems to me to be a different feature.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Stephan Szabo escribió:
> On Wed, 21 Feb 2007, Martijn van Oosterhout wrote:
> 
> > On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
> > > Well, for two reasons:
> > >
> > > 1) If you have a table with one very-frequently-accessed varchar()
> > > column and several not-frequently-accessed int columns, it might
> > > actually make sense to put the varchar column first.  The system won't
> > > always be able to make the most intelligent decision about table
> > > layout.
> >
> > Umm, the point of the exercise is that if you know there are int
> > columns, then you can skip over them, whereas you can never skip over a
> > varchar column. So there isn't really any situation where it would be
> > better to put the varchar first.
> 
> IIRC, in the first message in this thread, or another recent thread of
> this type, someone tried a reordering example with alternating
> smallints and varchar() and found that the leftmost varchar was
> actually slower to access after reordering, so I'm not sure that we can
> say there isn't a situation where it would affect things.

Offsets are cached in tuple accesses, but the caching is obviously
disabled for all attributes past any variable-length attribute.  So if
you put a varlena attr in front, caching is completely disabled for all
attrs (but that first one).  The automatic reordering algorithm must put
all fixed-len attrs at the front, so that their offets (and that of the
first variable length attr) can be cached.

Did I miss something in what you were trying to say?  I assume you must
already know this.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan


Simon Riggs wrote:


I agree with comments here about the multiple orderings being a horrible
source of bugs, as well as lots of coding even to make it happen at all
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00859.php

  


I thought we were going with this later proposal of Tom's (on which he's 
convinced me): 
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php - if 
not I'm totally confused (situation normal). The current thread started 
with this sentence:



Inspired by this thread [1], and in particular by the idea of storing
three numbers (permanent ID, on-disk storage position, display
position) for each column, I spent a little time messing around with a
prototype implementation of column storage positions to see what kind
of difference it would make.


I haven't understood Alvaro to suggest not keeping 3 numbers.

cheers



andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Simon Riggs wrote:

On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:

On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.

That's exactly what I'm proposing.  On table creation, the system
chooses an efficient column order for you. 


That's fairly straightforward and beneficial. I much prefer Alvaro's
approach rather than the storage position details originally described.
Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
really don't think you want to go there.

There is a problem: If people do a CREATE TABLE and then issue SELECT *
they will find the columns in a different order. That could actually
break some programs, so it isn't acceptable in all cases. e.g. COPY
without a column-list assumes that the incoming data should be assigned
to the table columns in the same order as the incoming data file.


But the display order (and hence the COPY order) of columns would still 
be determinted by attnum, not by some attstoragepos, no?
The column reordering would only apply to the physical storage of 
columns, not to how it's presented to the user I'd think.


The original idea was to add a third column, attdisplaypos, and let the 
user choose the display ordering independent from the unique id 
(attnum), which in turn is independent from the storage position.


For simplicity, the OP said he omitted the display-position part here,
because it's really orthogonal to being able to modify the storage position.

greetings, Florian Pflug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-21 Thread Stephan Szabo

On Wed, 21 Feb 2007, Martijn van Oosterhout wrote:

> On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
> > Well, for two reasons:
> >
> > 1) If you have a table with one very-frequently-accessed varchar()
> > column and several not-frequently-accessed int columns, it might
> > actually make sense to put the varchar column first.  The system won't
> > always be able to make the most intelligent decision about table
> > layout.
>
> Umm, the point of the exercise is that if you know there are int
> columns, then you can skip over them, whereas you can never skip over a
> varchar column. So there isn't really any situation where it would be
> better to put the varchar first.

IIRC, in the first message in this thread, or another recent thread of
this type, someone tried a reordering example with alternating
smallints and varchar() and found that the leftmost varchar was
actually slower to access after reordering, so I'm not sure that we can
say there isn't a situation where it would affect things.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug

Phil Currier wrote:

On 2/21/07, Martijn van Oosterhout  wrote:

> don't see any good way to perform an upgrade between PG versions
> without rewriting each table's data.  Maybe most people aren't doing
> upgrades like this right now, but it seems like it will only become
> more common in the future.  In my opinion, this is more important than
> #1.

I don't see this either. For all current tables, the storage position
is the attribute number, no exception. You say:

> because the version X table could
> have dropped columns that might or might not be present in any given
> tuple on disk.

Whether they're there or not is irrelevent. Drop columns are not
necesarily empty, but in any case they occupy a storage position until
the table is rewritten. A dump/restore doesn't need to preserve this,
but pg_migrator will need some smarts to handle it. The system will
need to create a column of the appropriate type and drop it to get to
the right state.

I agree, a dump/restore that rewrites all the table datafiles doesn't
need to handle this.  And I agree that the system will need to create
dropped columns and then drop them again, that's exactly what I
suggested in fact.  We're talking about pg_migrator-style upgrades
only here.

Why would a pg_migrator style upgrade use pg_dump at all? I assumed it
would rather copy the verbatim data from the old to the new catalog,
only changing it if the layout of the tables in pg_catalog actually 
changed.

greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Simon Riggs

On Wed, 2007-02-21 at 13:16 -0500, Andrew Dunstan wrote:
> Simon Riggs wrote:
> > On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:
> >   
> >> On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> >> 
> >>> I'd expect the system being able to reoder the columns to the most
> >>> efficient order possible (performance-wise and padding-saving-wise),
> >>> automatically.  When you create a table, sort the columns to the most
> >>> efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> >>> end of the tuple; and anything that requires a rewrite of the table
> >>> (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> >>> do it as well; and do it on TRUNCATE also) again recomputes the most
> >>> efficient order.
> >>>   
> >> That's exactly what I'm proposing.  On table creation, the system
> >> chooses an efficient column order for you. 
> >> 
> >
> > That's fairly straightforward and beneficial. I much prefer Alvaro's
> > approach rather than the storage position details originally described.
> > Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
> > really don't think you want to go there.
> >
> > There is a problem: If people do a CREATE TABLE and then issue SELECT *
> > they will find the columns in a different order. That could actually
> > break some programs, so it isn't acceptable in all cases. e.g. COPY
> > without a column-list assumes that the incoming data should be assigned
> > to the table columns in the same order as the incoming data file.
> >   
> 
> You seem to have missed that we will be separating logical from physical 
> ordering. Each attribute will have a permanent id, a physical ordering 
> and a logical ordering. You can change either ordering without affecting 
> the other.

I missed nothing, AFAICS. My understanding was that Alvaro was proposing
to have just a simple physical re-ordering and that would be altered at
CREATE TABLE time. No complexity of multiple column orderings: nice,
simple and effective. My only addition was to say: must be optional.

> COPY, SELECT and all user-visible commands should follow the logical 
> ordering, not the physical ordering, which should be completely 
> invisible to SQL.

I agree with comments here about the multiple orderings being a horrible
source of bugs, as well as lots of coding even to make it happen at all
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00859.php

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan


Simon Riggs wrote:

On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:
  

On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:


I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.
  

That's exactly what I'm proposing.  On table creation, the system
chooses an efficient column order for you. 



That's fairly straightforward and beneficial. I much prefer Alvaro's
approach rather than the storage position details originally described.
Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
really don't think you want to go there.

There is a problem: If people do a CREATE TABLE and then issue SELECT *
they will find the columns in a different order. That could actually
break some programs, so it isn't acceptable in all cases. e.g. COPY
without a column-list assumes that the incoming data should be assigned
to the table columns in the same order as the incoming data file.
  


You seem to have missed that we will be separating logical from physical 
ordering. Each attribute will have a permanent id, a physical ordering 
and a logical ordering. You can change either ordering without affecting 
the other.


COPY, SELECT and all user-visible commands should follow the logical 
ordering, not the physical ordering, which should be completely 
invisible to SQL.


cheers

andrew



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier

On 2/21/07, Martijn van Oosterhout  wrote:

> don't see any good way to perform an upgrade between PG versions
> without rewriting each table's data.  Maybe most people aren't doing
> upgrades like this right now, but it seems like it will only become
> more common in the future.  In my opinion, this is more important than
> #1.

I don't see this either. For all current tables, the storage position
is the attribute number, no exception. You say:

> because the version X table could
> have dropped columns that might or might not be present in any given
> tuple on disk.

Whether they're there or not is irrelevent. Drop columns are not
necesarily empty, but in any case they occupy a storage position until
the table is rewritten. A dump/restore doesn't need to preserve this,
but pg_migrator will need some smarts to handle it. The system will
need to create a column of the appropriate type and drop it to get to
the right state.

I agree, a dump/restore that rewrites all the table datafiles doesn't
need to handle this.  And I agree that the system will need to create
dropped columns and then drop them again, that's exactly what I
suggested in fact.  We're talking about pg_migrator-style upgrades
only here.

Say we do this in 8.2:

create table foo (a varchar(10), b int);
insert into foo 
alter table foo add column c int;

At this point, the column storage order is (a, b, c) because 8.2 never
changes storage order.  Then you upgrade to 8.3.  pg_dump now wants to
write out some DDL that will create a table matching the existing
table datafile, since we don't want to have to rewrite it.  pg_dump
prints out:

create table foo (a varchar(10), b int, c int);

The 8.3 system will try to create the table with column order (b, c,
a), since it's trying to optimize storage order, and that won't match
the existing table datafile.  What we need is a way to make sure that
the table matches the original datafile.

Now say that it's not an 8.2 -> 8.3 upgrade, say it's an 8.3 -> 8.4
upgrade.  In this case, 8.3 would have the table with storage order
(b, a, c).  (Column c would have been added at the end since it was
added without a default, and didn't force a table rewrite.)  How do
you get pg_dump to print out table creation DDL that will result in a
table matching the existing (b, a, c) table datafile?

This is why I think pg_dump needs to be able to print an ALTER TABLE
statement that will explicitly assign storage positions.  This happens
to have the side-effect of being potentially useful to admins who
might want control over that.

If this only affected 8.2 -> 8.3 upgrades, then maybe it's not as
important an issue.  But I think it affects *all* future upgrades,
which is why I'm trying to raise the issue now.

If you really want to use pg_dump I'd suggest an option to pg_dump
--dump-dropped-columns which will include the dropped columns in the
CREATE TABLE but drop them immediatly after. It's really more a corner
case than anything else.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Simon Riggs

On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:
> On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> > I'd expect the system being able to reoder the columns to the most
> > efficient order possible (performance-wise and padding-saving-wise),
> > automatically.  When you create a table, sort the columns to the most
> > efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> > end of the tuple; and anything that requires a rewrite of the table
> > (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> > do it as well; and do it on TRUNCATE also) again recomputes the most
> > efficient order.
> 
> That's exactly what I'm proposing.  On table creation, the system
> chooses an efficient column order for you. 

That's fairly straightforward and beneficial. I much prefer Alvaro's
approach rather than the storage position details originally described.
Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
really don't think you want to go there.

There is a problem: If people do a CREATE TABLE and then issue SELECT *
they will find the columns in a different order. That could actually
break some programs, so it isn't acceptable in all cases. e.g. COPY
without a column-list assumes that the incoming data should be assigned
to the table columns in the same order as the incoming data file.

So if we do this, it should be controllable using a GUC: 
optimize_column_order = off (default) | on
This should be a USERSET, so different users can create tables in either
full control or optimised mode, as they choose.

It should be possible to do that with the minimum number of position
swaps, so that people who have ordered the columns according to usage
frequency would still get what they wanted.

>  The next time an ALTER
> TABLE operation forces a rewrite, the system would recompute the
> column storage order.  I hadn't thought of having CLUSTER also redo
> the storage order, but that seems safe since it takes an exclusive
> lock on the table.  I'm less sure about whether it's safe to do this
> during a TRUNCATE.

The GUC should apply to whenever/wherever this optimization occurs.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-21 Thread Martijn van Oosterhout

On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
> Well, for two reasons:
> 
> 1) If you have a table with one very-frequently-accessed varchar()
> column and several not-frequently-accessed int columns, it might
> actually make sense to put the varchar column first.  The system won't
> always be able to make the most intelligent decision about table
> layout.

Umm, the point of the exercise is that if you know there are int
columns, then you can skip over them, whereas you can never skip over a
varchar column. So there isn't really any situation where it would be
better to put the varchar first.

> 
> don't see any good way to perform an upgrade between PG versions
> without rewriting each table's data.  Maybe most people aren't doing
> upgrades like this right now, but it seems like it will only become
> more common in the future.  In my opinion, this is more important than
> #1.

I don't see this either. For all current tables, the storage position
is the attribute number, no exception. You say:

> because the version X table could
> have dropped columns that might or might not be present in any given
> tuple on disk. 

Whether they're there or not is irrelevent. Drop columns are not
necesarily empty, but in any case they occupy a storage position until
the table is rewritten. A dump/restore doesn't need to preserve this,
but pg_migrator will need some smarts to handle it. The system will
need to create a column of the appropriate type and drop it to get to
the right state.

If you really want to use pg_dump I'd suggest an option to pg_dump
--dump-dropped-columns which will include the dropped columns in the
CREATE TABLE but drop them immediatly after. It's really more a corner
case than anything else.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Andrew Dunstan wrote:

Florian G. Pflug wrote:


I think you'd want to have a flag per field that tell you if the user
has overridden the storage pos for that specific field. Otherwise,
the next time you have to chance to optimize the ordering, you might
throw away changes that the admin has done on purpose. The same hold
true for a pg_dump/pg_reload cycle. If none of the fields had their
storage order changed manually, you'd want to reoder them optimally
at dump/reload time. If, however, the admin specified an ordering, you'd
want to preserve that.



I don't think users should be monkeying with the storage position at 
all. Decisions about that should belong to the engine, not to users. 
Providing a user tweakable knob for this strikes me as a large footgun, 
as well as requiring all sorts of extra checks along the lines you are 
talking of.


Maybe you shouldn't support specifying the storage order directly, but
rather through some kind of "priority field". The idea would be that
the storage order is determinted by sorting the fields according to
the priority field. Groups of fields with the same priority would
get ordered for maximal space efficiency.

greetings, Florian Pflug

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Martijn van Oosterhout wrote:

On Wed, Feb 21, 2007 at 03:59:12PM +0100, Florian G. Pflug wrote:

I think you'd want to have a flag per field that tell you if the user
has overridden the storage pos for that specific field. Otherwise,
the next time you have to chance to optimize the ordering, you might
throw away changes that the admin has done on purpose. 


Why would you want to let the admin have any say at all about the
storage order?


It wasn't my idea - the OP proposed a "alter table  alter column 
 set storage position " command. But if you're gonna decouple

the storage order from the attnum, they why don't let the dba tweak it?

Since you have at least two possible optimization speeds - for size, or
for fast access to specifc fields, creating a one-size-fits-all ordering
rule seems hard...

greetings, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier

On 2/21/07, Martijn van Oosterhout  wrote:

On Wed, Feb 21, 2007 at 03:59:12PM +0100, Florian G. Pflug wrote:
> I think you'd want to have a flag per field that tell you if the user
> has overridden the storage pos for that specific field. Otherwise,
> the next time you have to chance to optimize the ordering, you might
> throw away changes that the admin has done on purpose.

Why would you want to let the admin have any say at all about the
storage order?

Well, for two reasons:

1) If you have a table with one very-frequently-accessed varchar()
column and several not-frequently-accessed int columns, it might
actually make sense to put the varchar column first.  The system won't
always be able to make the most intelligent decision about table
layout.

2) As I described in my original email, without this capability, I
don't see any good way to perform an upgrade between PG versions
without rewriting each table's data.  Maybe most people aren't doing
upgrades like this right now, but it seems like it will only become
more common in the future.  In my opinion, this is more important than
#1.

But I understand that it's a potential foot-gun, so I'm happy to drop
it.  It would be nice though if there were some ideas about how to
address problem #2 at least.

phil

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan


Florian G. Pflug wrote:


I think you'd want to have a flag per field that tell you if the user
has overridden the storage pos for that specific field. Otherwise,
the next time you have to chance to optimize the ordering, you might
throw away changes that the admin has done on purpose. The same hold
true for a pg_dump/pg_reload cycle. If none of the fields had their
storage order changed manually, you'd want to reoder them optimally
at dump/reload time. If, however, the admin specified an ordering, you'd
want to preserve that.



I don't think users should be monkeying with the storage position at 
all. Decisions about that should belong to the engine, not to users. 
Providing a user tweakable knob for this strikes me as a large footgun, 
as well as requiring all sorts of extra checks along the lines you are 
talking of.


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Martijn van Oosterhout

On Wed, Feb 21, 2007 at 03:59:12PM +0100, Florian G. Pflug wrote:
> I think you'd want to have a flag per field that tell you if the user
> has overridden the storage pos for that specific field. Otherwise,
> the next time you have to chance to optimize the ordering, you might
> throw away changes that the admin has done on purpose. 

Why would you want to let the admin have any say at all about the
storage order?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Column storage positions

2007-02-21 Thread Bruce Momjian

Alvaro Herrera wrote:
> Bruce Momjian escribi?:
> > Phil Currier wrote:
> > > On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> > > > I'd expect the system being able to reoder the columns to the most
> > > > efficient order possible (performance-wise and padding-saving-wise),
> > > > automatically.  When you create a table, sort the columns to the most
> > > > efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> > > > end of the tuple; and anything that requires a rewrite of the table
> > > > (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> > > > do it as well; and do it on TRUNCATE also) again recomputes the most
> > > > efficient order.
> > > 
> > > That's exactly what I'm proposing.  On table creation, the system
> > > chooses an efficient column order for you.  The next time an ALTER
> > > TABLE operation forces a rewrite, the system would recompute the
> > > column storage order.  I hadn't thought of having CLUSTER also redo
> > > the storage order, but that seems safe since it takes an exclusive
> > > lock on the table.  I'm less sure about whether it's safe to do this
> > > during a TRUNCATE.
> > 
> > Keep in mind we have a patch in process to reduce the varlena length and
> > reduce alignment requirements, so once that is in, reordering columns
> > will not be as important.
> 
> Yes, but the "cache offset" stuff is still significant, so there will be
> some benefit in putting all the fixed-length attributes at the start of
> the tuple, and varlena atts grouped at the end.

Agreed.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Bruce Momjian

Phil Currier wrote:
> On 2/21/07, Bruce Momjian <[EMAIL PROTECTED]> wrote:
> > Keep in mind we have a patch in process to reduce the varlena length and
> > reduce alignment requirements, so once that is in, reordering columns
> > will not be as important.
> 
> Well, as I understand it, that patch isn't really addressing the same
> problem.  Consider this table:
> create table foo (a varchar(10), b int, c smallint, d int, e smallint, );
> 
> There are two problems here:
> 
> 1) On my machine, each int/smallint column pair takes up 8 bytes.  2
> of those 8 bytes are alignment padding wasted on the smallint field.
> If we grouped all the smallint fields together within the tuple, that
> space would not be lost.

Yes, good point.

> 2) Each time you access any of the int/smallint fields, you have to
> peek inside the varchar field to figure out its length.  If we stored
> the varchar field at the end of the tuple instead, the access times
> for all the other fields would be measurably improved, by a factor
> that greatly outweighs the small penalty imposed on the varchar field
> itself.
> 
> My understanding is that the varlena headers patch would potentially
> reduce the size of the varchar header (which is definitely worthwhile
> by itself), but it wouldn't help much for either of these problems.
> Or am I misunderstanding what that patch does?
> 

Agreed.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier


On 2/21/07, Bruce Momjian <[EMAIL PROTECTED]> wrote:

Keep in mind we have a patch in process to reduce the varlena length and
reduce alignment requirements, so once that is in, reordering columns
will not be as important.


Well, as I understand it, that patch isn't really addressing the same
problem.  Consider this table:
create table foo (a varchar(10), b int, c smallint, d int, e smallint, );

There are two problems here:

1) On my machine, each int/smallint column pair takes up 8 bytes.  2
of those 8 bytes are alignment padding wasted on the smallint field.
If we grouped all the smallint fields together within the tuple, that
space would not be lost.

2) Each time you access any of the int/smallint fields, you have to
peek inside the varchar field to figure out its length.  If we stored
the varchar field at the end of the tuple instead, the access times
for all the other fields would be measurably improved, by a factor
that greatly outweighs the small penalty imposed on the varchar field
itself.

My understanding is that the varlena headers patch would potentially
reduce the size of the varchar header (which is definitely worthwhile
by itself), but it wouldn't help much for either of these problems.
Or am I misunderstanding what that patch does?

phil

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Bruce Momjian escribió:
> Phil Currier wrote:
> > On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> > > I'd expect the system being able to reoder the columns to the most
> > > efficient order possible (performance-wise and padding-saving-wise),
> > > automatically.  When you create a table, sort the columns to the most
> > > efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> > > end of the tuple; and anything that requires a rewrite of the table
> > > (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> > > do it as well; and do it on TRUNCATE also) again recomputes the most
> > > efficient order.
> > 
> > That's exactly what I'm proposing.  On table creation, the system
> > chooses an efficient column order for you.  The next time an ALTER
> > TABLE operation forces a rewrite, the system would recompute the
> > column storage order.  I hadn't thought of having CLUSTER also redo
> > the storage order, but that seems safe since it takes an exclusive
> > lock on the table.  I'm less sure about whether it's safe to do this
> > during a TRUNCATE.
> 
> Keep in mind we have a patch in process to reduce the varlena length and
> reduce alignment requirements, so once that is in, reordering columns
> will not be as important.

Yes, but the "cache offset" stuff is still significant, so there will be
some benefit in putting all the fixed-length attributes at the start of
the tuple, and varlena atts grouped at the end.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Phil Currier wrote:

On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.


That's exactly what I'm proposing.  On table creation, the system
chooses an efficient column order for you.  The next time an ALTER
TABLE operation forces a rewrite, the system would recompute the
column storage order.  I hadn't thought of having CLUSTER also redo
the storage order, but that seems safe since it takes an exclusive
lock on the table.  I'm less sure about whether it's safe to do this
during a TRUNCATE.


I think you'd want to have a flag per field that tell you if the user
has overridden the storage pos for that specific field. Otherwise,
the next time you have to chance to optimize the ordering, you might
throw away changes that the admin has done on purpose. The same hold
true for a pg_dump/pg_reload cycle. If none of the fields had their
storage order changed manually, you'd want to reoder them optimally
at dump/reload time. If, however, the admin specified an ordering, you'd
want to preserve that.

greetings, Florian Pflug

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-21 Thread Bruce Momjian

Phil Currier wrote:
> On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> > I'd expect the system being able to reoder the columns to the most
> > efficient order possible (performance-wise and padding-saving-wise),
> > automatically.  When you create a table, sort the columns to the most
> > efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
> > end of the tuple; and anything that requires a rewrite of the table
> > (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
> > do it as well; and do it on TRUNCATE also) again recomputes the most
> > efficient order.
> 
> That's exactly what I'm proposing.  On table creation, the system
> chooses an efficient column order for you.  The next time an ALTER
> TABLE operation forces a rewrite, the system would recompute the
> column storage order.  I hadn't thought of having CLUSTER also redo
> the storage order, but that seems safe since it takes an exclusive
> lock on the table.  I'm less sure about whether it's safe to do this
> during a TRUNCATE.

Keep in mind we have a patch in process to reduce the varlena length and
reduce alignment requirements, so once that is in, reordering columns
will not be as important.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier


On 2/21/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.


That's exactly what I'm proposing.  On table creation, the system
chooses an efficient column order for you.  The next time an ALTER
TABLE operation forces a rewrite, the system would recompute the
column storage order.  I hadn't thought of having CLUSTER also redo
the storage order, but that seems safe since it takes an exclusive
lock on the table.  I'm less sure about whether it's safe to do this
during a TRUNCATE.

phil

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Phil Currier escribió:
> Inspired by this thread [1], and in particular by the idea of storing
> three numbers (permanent ID, on-disk storage position, display
> position) for each column, I spent a little time messing around with a
> prototype implementation of column storage positions to see what kind
> of difference it would make.  The results were encouraging: on a table
> with 20 columns of alternating smallint and varchar(10) datatypes,
> selecting the max() of one of the rightmost int columns across 1
> million rows ran around 3 times faster.

[snipped]

I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Column storage positions

2007-02-20 Thread Robert Treat

On Tuesday 20 February 2007 16:07, Phil Currier wrote:
> Another problem relates to upgrades.  With tools like pg_migrator now
> on pgfoundry, people will eventually expect quick upgrades that don't
> require rewriting each table's data.  Storage positions would cause a
> problem for every version X -> version Y upgrade with Y >= 8.3, even
> when X is also >= 8.3, because a version X table could always have
> been altered without a rewrite into a structure different from what
> Y's CREATE TABLE will choose.  

If you are using pg_migrator your not going to be moving the datafiles on disk 
anyway,so pg_migrator's behavior shouldnt change terribly.  If your doing 
pg_dump based upgrade, presumably pg_dump could write it's create statements 
with the columns in attstorpos order and set attnum = attstorpos, preserving 
the physical layout from the previous install.

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-20 Thread Sergey E. Koposov



Just as my 2 cents to the proposed idea. 
I want to demonstrate that the proposed idea is very relevant for the

performance.

I recently did an migration from PG 8.1 to PG 8.2. During that time I was 
dumping the 2TB database with several very wide tables (having ~ 200 
columns). And I saw that on my pretty powerful server with 8Gb 
RAM, Itanium2 procesor,large RAID which can do I/O at 100Mb/sec the 
performance of pg_dump was CPU limited, and the read speed of the tables 
was 1-1.5mb/sec (leading to 2 week dumping time).


I was very surprised by these times, and profiled postgres to check the 
reason of that:

here is the top of gprof:
  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 60.72 13.5213.52  6769826 0.00 0.00  nocachegetattr
 10.58 15.88 2.36  9035566 0.00 0.00  CopyAttributeOutText
  7.22 17.49 1.61 65009457 0.00 0.00  CopySendData
  6.34 18.90 1.411 1.4122.21  CopyTo

So the main slow-down of the process was all this code recomputing the 
boundaries of the columns I checked that by removing one tiny varchar 
column and COALESCING all NULLs, and after that the performance of 
pg_dumping increased by more than a factor of 2!


I should have reported that experience earlier... but I hope that my 
observations can be useful in the context of the Phil's idea.


regards,
Sergey

***
Sergey E. Koposov
Max Planck Institute for Astronomy/Cambridge Institute for Astronomy/Sternberg 
Astronomical Institute
Tel: +49-6221-528-349
Web: http://lnfm1.sai.msu.ru/~math
E-mail: [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[HACKERS] Column storage positions

2007-02-20 Thread Phil Currier


Inspired by this thread [1], and in particular by the idea of storing
three numbers (permanent ID, on-disk storage position, display
position) for each column, I spent a little time messing around with a
prototype implementation of column storage positions to see what kind
of difference it would make.  The results were encouraging: on a table
with 20 columns of alternating smallint and varchar(10) datatypes,
selecting the max() of one of the rightmost int columns across 1
million rows ran around 3 times faster.  The same query on the
leftmost varchar column (which should suffer the most from this
change) predictably got a little slower (about 10%); I couldn't
measure a performance drop on the rightmost varchar columns.  The
table's size didn't drop much in this case, but a different table of
20 alternating int and smallint columns showed a 20% slimmer disk
footprint, pretty much as expected.  Pgbenching showed no measurable
difference, which isn't surprising since the pgbench test tables
consist of just int values with char filler at the end.

So here is a proposal for separating a column's storage position from
its permanent ID.  I've ignored the display position piece of the
original thread because display positions don't do much other than
save you the hassle of creating a view on top of your table, while
storage positions have demonstrable, tangible benefits.  And there is
no reason to connect the two features; display positions can easily be
added separately at a later point.

We want to decouple a column's on-disk storage position from its
permanent ID for two reasons: to minimize the space lost to alignment
padding between fields, and to speed up access to individual fields.
The system will automatically assign new storage positions when a
table is created, and when a table alteration requires a rewrite
(currently just adding a column with a default, or changing a column
datatype).  To allow users to optimize tables based on the fields they
know will be frequently accessed, I think we should extend ALTER TABLE
to accept user-assigned storage positions (something like "ALTER TABLE
ALTER col SET STORAGE POSITION X").  This command would also be useful
for another reason discussed below.

In my prototype, I used these rules to determine columns' storage order:
1) fixed-width fields before variable-width, dropped columns always last
2) fixed-width fields ordered by increasing size
3) not-null fields before nullable fields
There are other approaches worth considering - for example, you could
imagine swapping the priority of rules 2 and 3.  Resultant tables
would generally have more alignment waste, but would tend to have
slightly faster field access.  I'm really not sure what the optimal
strategy is since every user will have a slightly different metric for
"optimal".  In any event, either of these approaches is better than
the current situation.

To implement this, we'll need a field (perhaps attstoragepos?) in
pg_attribute to hold the storage position.  It will equal attnum until
it is explicitly reassigned.  The routines in heaptuple.c need to
quickly loop through the fields of a tuple in storage order rather
than attnum order, so I propose extending TupleDesc to hold an
"attrspos" array that sits alongside the attrs array.  In the
prototype I used an array of int2 indices into the attrs array,
ordered by storage position.

These changes cause a problem in ExecTypeFromTLInternal: this function
calls CreateTemplateTupleDesc followed by TupleDescInitEntry, assuming
that attnum == attstoragepos for all tuples.  With the introduction of
storage positions, this of course will no longer be true.  I got
around this by having expand_targetlist, build_physical_tlist, and
build_relation_tlist make sure each TargetEntry (for targetlists
corresponding to either insert/update tuples, or base tuples pulled
straight from the heap) gets a correct resorigtbl and resname.  Then
ExecTypeFromTLInternal first tries calling a new function
TupleDescInitEntryAttr, which hands off to TupleDescInitEntry and then
performs a syscache lookup to update the storage position using the
resorigtbl.  This is a little ugly because ExecTypeFromTLInternal
doesn't know in advance what kind of tupledesc it's building, so it
needs to retreat to the old method whenever the syscache lookup fails,
but it was enough to pass the regression tests.  I could use some
advice on this - there's probably a better way to do it.

Another problem relates to upgrades.  With tools like pg_migrator now
on pgfoundry, people will eventually expect quick upgrades that don't
require rewriting each table's data.  Storage positions would cause a
problem for every version X -> version Y upgrade with Y >= 8.3, even
when X is also >= 8.3, because a version X table could always have
been altered without a rewrite into a structure different from what
Y's CREATE TABLE will choose.  I don't think it's as simple as just
using the above-mentioned ALTER TABLE extension to a

60 matches

Mail list logo