Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-07-08 Thread Josh Berkus
On 06/26/2013 07:05 AM, Jamie Martin wrote:
 FYI I submitted a slightly modified patch since Amit's measurements that is 
 slightly faster. 

Yes.  My perspective is that this is a worthwhile optimization for a
minority, but fairly well-known, use case, provided that it doesn't
negatively impact any other, more common use case.  Potential cases
where I can see negative impact are:

A) normal table with a few, mostly non-null columns (recent pgbench
testing seems to have validated no measurable impact).

B) table with many (400+) mostly non-null columns

C) table with many (400+) mostly null columns, where column #390 was
null and gets updated to a not null value

I don't *think* that Jamie's performance tests have really addressed the
above cases.  However, do people agree that if performance on the patch
passes for all of A, B and C, then it's OK to apply?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-27 Thread Josh Berkus
All,

So, is this patch currently depending on performance testing, or not?
Like I said, it'll be a chunk of time to set up what I beleive is a
realistic performance test, so I don't want to do it if the patch is
likely to be bounced for other reasons.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-27 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 So, is this patch currently depending on performance testing, or not?
 Like I said, it'll be a chunk of time to set up what I beleive is a
 realistic performance test, so I don't want to do it if the patch is
 likely to be bounced for other reasons.

AFAICS, the threshold question here is whether the patch helps usefully
for tables with typical numbers of columns (ie, not 800 ;-)), and
doesn't hurt materially in any common scenario.  If it does, I think
we'd take it.  I've not read the patch, so I don't know if there are
minor cosmetic or correctness issues, but at bottom this seems to be a
very straightforward performance tradeoff.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-27 Thread Josh Berkus
On 06/27/2013 11:11 AM, Tom Lane wrote:
 AFAICS, the threshold question here is whether the patch helps usefully
 for tables with typical numbers of columns (ie, not 800 ;-)), and

I wouldn't expect it to help at all for  50 columns, and probably not
measurably below 200.

 doesn't hurt materially in any common scenario.  If it does, I think

Yeah, I think that's be bigger question.  Ok, I'll start working on a
new test case.  Here's my thinking on performance tests:

1. run pgbench 10 times both with and without the patch.  See if there's
any measurable difference.  There should not be.

2. Run with/without comparisons for the following scenarios:

Each table would have a SERIAL pk, a timestamp, and:

Chains of booleans:
# attributesNULL probability
16  0%  50% 90%
128 0%  50% 90%
512 0%  50% 90%

Chains of text:
16  0%  50% 90%
256 0%  50% 90%

... for 100,000 rows each.

Comparisons would include:

1) table size before and after testing
2) time required to read 1000 rows, by key
3) time required to read rows with each of
100 random column positions = some value
4) time required to insert 1000 rows
5) time required to update 1000 rows

Geez, I feel tired just thinking about it.  Jamie, can your team do any
of this?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-26 Thread Jamie Martin
FYI I submitted a slightly modified patch since Amit's measurements that is 
slightly faster. 

On Jun 25, 2013, at 1:26 PM, Amit Kapila amit.kap...@huawei.com wrote:

 On Monday, June 24, 2013 8:20 PM Tom Lane wrote:
 Amit Kapila amit.kap...@huawei.com writes:
 I will summarize the results, and if most of us feel that they are
 not good
 enough, then we can return this patch.
 
 Aside from the question of whether there's really any generally-useful
 performance improvement from this patch, it strikes me that this change
 forecloses other games that we might want to play with interpretation
 of
 the value of a tuple's natts.
 
 In particular, when I was visiting Salesforce last week, the point came
 up that they'd really like ALTER TABLE ADD COLUMN to be free even
 when
 the column had a non-null DEFAULT.  It's not too difficult to imagine
 how we might support that, at least when the default expression is a
 constant: decree that if the requested attribute number is beyond
 natts,
 look aside at the tuple descriptor to see if the column is marked as
 having a magic default value, and if so, substitute that rather than
 just returning NULL.  (This has to be a magic value separate from
 the current default, else a subsequent DROP or SET DEFAULT would do
 the wrong thing.)
 
 However, this idea conflicts with an optimization that supposes it can
 reduce natts to suppress null columns: if the column was actually
 stored
 as null, you'd lose that information, and would incorrectly return the
 magic default on subsequent reads.
 
 I think it might be possible to make both ideas play together, by
 not reducing natts further than the last column with a magic default.
 However, that means extra complexity in heap_form_tuple, which would
 invalidate the performance measurements done in support of this patch.
 
  It can have slight impact on normal scenario's, but I am not sure how much
 because
  the change will be very less(may be one extra if check and one assignment)
 
  For this Patch's scenario, I think the major benefit for Performance is in
 heap_fill_tuple() where the 
  For loop is reduced. However added some logic in heap_form_tuple can
 reduce the performance improvement, 
  but there can still be space saving benefit. 
 
 With Regards,
 Amit Kapila.
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-25 Thread Amit Kapila
On Monday, June 24, 2013 8:20 PM Tom Lane wrote:
 Amit Kapila amit.kap...@huawei.com writes:
  I will summarize the results, and if most of us feel that they are
 not good
  enough, then we can return this patch.
 
 Aside from the question of whether there's really any generally-useful
 performance improvement from this patch, it strikes me that this change
 forecloses other games that we might want to play with interpretation
 of
 the value of a tuple's natts.
 
 In particular, when I was visiting Salesforce last week, the point came
 up that they'd really like ALTER TABLE ADD COLUMN to be free even
 when
 the column had a non-null DEFAULT.  It's not too difficult to imagine
 how we might support that, at least when the default expression is a
 constant: decree that if the requested attribute number is beyond
 natts,
 look aside at the tuple descriptor to see if the column is marked as
 having a magic default value, and if so, substitute that rather than
 just returning NULL.  (This has to be a magic value separate from
 the current default, else a subsequent DROP or SET DEFAULT would do
 the wrong thing.)
 
 However, this idea conflicts with an optimization that supposes it can
 reduce natts to suppress null columns: if the column was actually
 stored
 as null, you'd lose that information, and would incorrectly return the
 magic default on subsequent reads.
 
 I think it might be possible to make both ideas play together, by
 not reducing natts further than the last column with a magic default.
 However, that means extra complexity in heap_form_tuple, which would
 invalidate the performance measurements done in support of this patch.

  It can have slight impact on normal scenario's, but I am not sure how much
because
  the change will be very less(may be one extra if check and one assignment)

  For this Patch's scenario, I think the major benefit for Performance is in
heap_fill_tuple() where the 
  For loop is reduced. However added some logic in heap_form_tuple can
reduce the performance improvement, 
  but there can still be space saving benefit. 

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Amit Kapila
On Monday, June 24, 2013 5:48 PM Andres Freund wrote:
 On 2013-01-25 09:06:12 +0530, Amit Kapila wrote:
  On 01/25/2013 03:43 AM, Jameison Martin wrote:
   there have been a lot of different threads on this patch. i'm going
 to
  take a stab at  teasing them out, summarizing them, and addressing
 them
  individually.
 
   Is this patch on the CF app? I can't seem to find it in 2013-01 or
  2013-next, and I
   wanted to add your summary.
 
  It is in previous CF (2012-11) in Returned with Feedback section
  https://commitfest.postgresql.org/action/commitfest_view?id=16
 
 I'd argue that the patch ought to be still marked as Returned with
 Feedback instead of again being marked as Needs Review. I don't
 really see anything new that has come up?

You are right that nothing new has come up. The major reason why this patch
could not go into 9.3 was that it is not clearly evident whether
Performance gains of this patch are sufficient to take risks (Tom points out
that bugs caused by such changes can be difficult to identify) of committing
this code. 

I will summarize the results, and if most of us feel that they are not good
enough, then we can return this patch.

Observations from Performance Results for tables with less columns

1. Approximately 13% space savings for table with 12 columns, out of which
last 4 are Nulls. 
   This table schema is such first 3 columns are integers and last 9 are
boolean's. 

Refer below link for detailed results:
http://www.postgresql.org/message-id/00b401cde1d8$746c90f0$5d45b2d0$@kapila@
huawei.com


Observations from Performance Results for tables with large columns

1. There is a visible performance increase when number of columns containing
NULLS are more than  60~70% in table have large number of columns. 
   Approximately 12% for table (having 800 cols, out of which 600 are nulls
OR having 300 columns, out of which 250 are nulls)   
2. There are visible space savings when number of columns containing NULLS
are more than  60~70% in table have large number of columns. 
   Approximately 11% for table (having 800 cols, out of which 600 are nulls
OR having 300 columns, out of which 250 are nulls)  

Refer below link for detailed results:
http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382853AA4
0@szxeml509-mbs


Observation from original pgbench
--

1. There is  1% performance dip for original pgbench, this could be due to
fluctuation in readings or some consequences of addition of code which is
difficult to reason out. 
Refer below link for detailed results:
http://www.postgresql.org/message-id/00bc01cde1f7$be4b4cb0$3ae1e610$@kapila@
huawei.com


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Tom Lane
Amit Kapila amit.kap...@huawei.com writes:
 I will summarize the results, and if most of us feel that they are not good
 enough, then we can return this patch.

Aside from the question of whether there's really any generally-useful
performance improvement from this patch, it strikes me that this change
forecloses other games that we might want to play with interpretation of
the value of a tuple's natts.

In particular, when I was visiting Salesforce last week, the point came
up that they'd really like ALTER TABLE ADD COLUMN to be free even when
the column had a non-null DEFAULT.  It's not too difficult to imagine
how we might support that, at least when the default expression is a
constant: decree that if the requested attribute number is beyond natts,
look aside at the tuple descriptor to see if the column is marked as
having a magic default value, and if so, substitute that rather than
just returning NULL.  (This has to be a magic value separate from
the current default, else a subsequent DROP or SET DEFAULT would do
the wrong thing.)

However, this idea conflicts with an optimization that supposes it can
reduce natts to suppress null columns: if the column was actually stored
as null, you'd lose that information, and would incorrectly return the
magic default on subsequent reads.

I think it might be possible to make both ideas play together, by
not reducing natts further than the last column with a magic default.
However, that means extra complexity in heap_form_tuple, which would
invalidate the performance measurements done in support of this patch.

In short, there's no free lunch ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Simon Riggs
On 24 June 2013 15:49, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kap...@huawei.com writes:
 I will summarize the results, and if most of us feel that they are not good
 enough, then we can return this patch.

 Aside from the question of whether there's really any generally-useful
 performance improvement from this patch, it strikes me that this change
 forecloses other games that we might want to play with interpretation of
 the value of a tuple's natts.

 In particular, when I was visiting Salesforce last week, the point came
 up that they'd really like ALTER TABLE ADD COLUMN to be free even when
 the column had a non-null DEFAULT.  It's not too difficult to imagine
 how we might support that, at least when the default expression is a
 constant: decree that if the requested attribute number is beyond natts,
 look aside at the tuple descriptor to see if the column is marked as
 having a magic default value, and if so, substitute that rather than
 just returning NULL.  (This has to be a magic value separate from
 the current default, else a subsequent DROP or SET DEFAULT would do
 the wrong thing.)

Now that is a mighty fine idea.

 However, this idea conflicts with an optimization that supposes it can
 reduce natts to suppress null columns: if the column was actually stored
 as null, you'd lose that information, and would incorrectly return the
 magic default on subsequent reads.

 I think it might be possible to make both ideas play together, by
 not reducing natts further than the last column with a magic default.
 However, that means extra complexity in heap_form_tuple, which would
 invalidate the performance measurements done in support of this patch.

 In short, there's no free lunch ...

Agreed.

I think its rather a shame that the proponents of this patch have
tried so hard to push this through without working variations on the
theme. Please guys, go away and get creative; rethink the approach so
that you can have your lunch without anybody else losing theirs. I
would add that I have seen the use case and want to support it, but
we're looking to add to the codebase not just steal small bites of
performance from existing use cases.

As a practical suggestion, why not change the macro so it doesn't even
try to do anything different unless the number of columns is  72. A
constant comparison should go very quickly and isolate the code better
from the more typical code path. If not, lets try some other ideas,
like Tom just did.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Robert Haas
On Mon, Jun 24, 2013 at 4:05 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I think its rather a shame that the proponents of this patch have
 tried so hard to push this through without working variations on the
 theme. Please guys, go away and get creative; rethink the approach so
 that you can have your lunch without anybody else losing theirs. I
 would add that I have seen the use case and want to support it, but
 we're looking to add to the codebase not just steal small bites of
 performance from existing use cases.

If there's an actual performance consequence of applying this patch,
then I think that's a good reason for rejecting it.  But if the best
argument we can come up with is that we might someday try to do even
more clever things with the tuple's natts value, I guess I'm not very
impressed.  The reason why we have to rewrite the table when someone
adds a column with a non-NULL default is because we need to store the
new value in every row.  Sure, we could defer that in this particular
case.  But users might be mighty dismayed to see CLUSTER or VACUUM
FULL -- or a dump-and-reload! -- cause the table to become much LARGER
than it was before.  Having some kind of column-oriented compression
would be darn nifty, but this particular path doesn't seem
particularly promising to me.

I think the larger and more fundamental problem with natts is that
there's just not very much bit space available there.  Even as it is,
someone who adds and drops columns with any regularity will eventually
hit the wall, and there aren't any good alternatives at that point.
Were there more bit space available here, we could even do things like
allow some special cases of ALTER TABLE .. SET DATA TYPE not to
rewrite the table; natts could become a sort of tuple version number,
and we'd remember how to convert to newer versions on the fly.  But
with only 2048 values available, it's not really feasible to start
consuming them for any more operations than we already do.  I'm
generally a big fan of the principle that no single patch should be
allowed to crowd out room for future extensibility in this particular
area, but in this case I think the bit space available is *already* so
limited that we're not likely to get much further with it without
changing the storage format.

So, Tom, how's that pluggable storage format coming?  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Josh Berkus
Simon,

 I think its rather a shame that the proponents of this patch have
 tried so hard to push this through without working variations on the
 theme. Please guys, go away and get creative; rethink the approach so
 that you can have your lunch without anybody else losing theirs. I
 would add that I have seen the use case and want to support it, but
 we're looking to add to the codebase not just steal small bites of
 performance from existing use cases.

Do we really have ironclad numbers which show that the patch affects
performance negatively?  I didn't think the previous performance test
was comprehensive; I was planning to develop one myself this week.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 If there's an actual performance consequence of applying this patch,
 then I think that's a good reason for rejecting it.  But if the best
 argument we can come up with is that we might someday try to do even
 more clever things with the tuple's natts value, I guess I'm not very
 impressed.  The reason why we have to rewrite the table when someone
 adds a column with a non-NULL default is because we need to store the
 new value in every row.  Sure, we could defer that in this particular
 case.  But users might be mighty dismayed to see CLUSTER or VACUUM
 FULL -- or a dump-and-reload! -- cause the table to become much LARGER
 than it was before.  Having some kind of column-oriented compression
 would be darn nifty, but this particular path doesn't seem
 particularly promising to me.

The point of what I was suggesting isn't to conserve storage, but to
reduce downtime during a schema change.  Remember that a rewriting ALTER
TABLE locks everyone out of that table for a long time.

 So, Tom, how's that pluggable storage format coming?  :-)

Well, actually, it's looking to me like heap_form_tuple will be
underneath the API cut, because alternate storage managers will probably
have other tuple storage formats --- column stores being the poster
child here, but in any case the tuple header format is very unlikely
to be universal.

Which means that whether this patch gets applied to mainline is going
to be moot for Salesforce's purposes; they will certainly want the
equivalent logic in their storage code, because they've got tables with
many hundreds of mostly-null columns, but whether heap_form_tuple acts
this way or not won't affect them.

So unless we consider that many-hundreds-of-columns is a design center
for general purpose use of Postgres, we should be evaluating this patch
strictly on its usefulness for more typical table widths.  And my take
on that is that (1) lots of columns isn't our design center (for the
reasons you mentioned among others), and (2) the case for the patch
looks pretty weak otherwise.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Simon Riggs
On 24 June 2013 21:50, Tom Lane t...@sss.pgh.pa.us wrote:


  So, Tom, how's that pluggable storage format coming?  :-)

 Well, actually, it's looking to me like heap_form_tuple will be
 underneath the API cut, because alternate storage managers will probably
 have other tuple storage formats --- column stores being the poster
 child here, but in any case the tuple header format is very unlikely
 to be universal.


Hopefully we would allow multiple storage managers to be active at once,
one per table?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Jameison Martin
i believe the last submission of the patch had no negative performance impact 
on any of the tested use cases, but you'd have to go back and see the last 
exchange on that. 

i think it was the concern about potentially regressing the codeline in 
unforeseen ways without a clear benefit to all but one use case (wide tables) 
that stalled things.




 From: Josh Berkus j...@agliodbs.com
To: Simon Riggs si...@2ndquadrant.com 
Cc: Tom Lane t...@sss.pgh.pa.us; Amit Kapila amit.kap...@huawei.com; 
Andres Freund and...@2ndquadrant.com; Craig Ringer cr...@2ndquadrant.com; 
Jameison Martin jameis...@yahoo.com; Noah Misch n...@leadboat.com; Kevin 
Grittner kgri...@mail.com; robertmh...@gmail.com; 
pgsql-hackers@postgresql.org 
Sent: Monday, June 24, 2013 10:32 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap 
rows to reduce the size of the null bitmap [Review]
 

Simon,

 I think its rather a shame that the proponents of this patch have
 tried so hard to push this through without working variations on the
 theme. Please guys, go away and get creative; rethink the approach so
 that you can have your lunch without anybody else losing theirs. I
 would add that I have seen the use case and want to support it, but
 we're looking to add to the codebase not just steal small bites of
 performance from existing use cases.

Do we really have ironclad numbers which show that the patch affects
performance negatively?  I didn't think the previous performance test
was comprehensive; I was planning to develop one myself this week.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com




Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Josh Berkus
On 06/24/2013 01:50 PM, Tom Lane wrote:
 The point of what I was suggesting isn't to conserve storage, but to
 reduce downtime during a schema change.  Remember that a rewriting ALTER
 TABLE locks everyone out of that table for a long time.

Right, but I'm worried about the surprise! factor.  That is, if we
make the first default free by using a magic value, then a SET DEFAULT
on that column is going to have some very surprising results as suddenly
the whole table needs to get written out for the old default.  In many
use cases, this would still be a net win, since 80% of the time users
don't change defaults after column creation.  But we'd have to make it
much less surprising somehow.  Also for the reason Tom pointed out, the
optimization would only work on with NOT NULL columns ... leading to
another potential unexpected surprise when the column is marked NULLable.

 So unless we consider that many-hundreds-of-columns is a design center
 for general purpose use of Postgres, we should be evaluating this patch
 strictly on its usefulness for more typical table widths.  And my take
 on that is that (1) lots of columns isn't our design center (for the
 reasons you mentioned among others), and (2) the case for the patch
 looks pretty weak otherwise.

Well, actually, hundreds of columns is reasonably common for a certain
user set (ERP, CRM, etc.).  If we could handle that use case very
efficiently, then it would win us some users, since other RDMBSes don't.
 However, there are multiple issues with having hundreds of columns, of
which storage optimization is only one ... and probably the smallest one
at that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 06/24/2013 01:50 PM, Tom Lane wrote:
 The point of what I was suggesting isn't to conserve storage, but to
 reduce downtime during a schema change.  Remember that a rewriting ALTER
 TABLE locks everyone out of that table for a long time.

 Right, but I'm worried about the surprise! factor.  That is, if we
 make the first default free by using a magic value, then a SET DEFAULT
 on that column is going to have some very surprising results as suddenly
 the whole table needs to get written out for the old default.

No, that's why we'd store the magic default separately.  That will be
permanent and unaffected by later SET DEFAULT operations.  (This
requires that every subsequently created tuple store the column
explicitly so that the magic default doesn't affect it; which is exactly
why there's a conflict with the currently-proposed patch.)

 ...  Also for the reason Tom pointed out, the
 optimization would only work on with NOT NULL columns ... leading to
 another potential unexpected surprise when the column is marked NULLable.

Huh?  We already have the case of null default handled.

 Well, actually, hundreds of columns is reasonably common for a certain
 user set (ERP, CRM, etc.).  If we could handle that use case very
 efficiently, then it would win us some users, since other RDMBSes don't.
  However, there are multiple issues with having hundreds of columns, of
 which storage optimization is only one ... and probably the smallest one
 at that.

Agreed; there are a lot of things we'd have to address if we really
wanted to claim this is a domain we work well in.  (I suspect Salesforce
will be chipping away at some of those issues, but as I said,
heap_form_tuple is not in their critical path.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Josh Berkus
On 06/24/2013 02:21 PM, Tom Lane wrote:
 Right, but I'm worried about the surprise! factor.  That is, if we
 make the first default free by using a magic value, then a SET DEFAULT
 on that column is going to have some very surprising results as suddenly
 the whole table needs to get written out for the old default.
 
 No, that's why we'd store the magic default separately.  That will be
 permanent and unaffected by later SET DEFAULT operations.  (This
 requires that every subsequently created tuple store the column
 explicitly so that the magic default doesn't affect it; which is exactly
 why there's a conflict with the currently-proposed patch.)

Yah.  So how likely is this to get done sometime in the next 2 releases?
 It's only a conflict if someone is going to write the code ...

 Agreed; there are a lot of things we'd have to address if we really
 wanted to claim this is a domain we work well in.  (I suspect Salesforce
 will be chipping away at some of those issues, but as I said,
 heap_form_tuple is not in their critical path.)

Well, doing the trailing column truncation as part of a general plan to
make having 600 columns less painful would make more sense than doing it
on its own.  For my personal use case(s), I don't really care about the
null bitmap that much; that amount of space simply isn't that
significant compared to the other performance issues involved.  I
started evaluating this patch just because I'm one of the few people
with a realistic test case.

Anyway, let's decide if acceptance of this patch hinges on performance
or not.  I'll take me a whole evening to set up a good performance test,
so I don't want to do it if it's going to be a moot point.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Robert Haas
On Mon, Jun 24, 2013 at 4:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The point of what I was suggesting isn't to conserve storage, but to
 reduce downtime during a schema change.  Remember that a rewriting ALTER
 TABLE locks everyone out of that table for a long time.

Sure, I understand.  As Josh says, though, it'd still be potentially
quite surprising.

 So, Tom, how's that pluggable storage format coming?  :-)

 Well, actually, it's looking to me like heap_form_tuple will be
 underneath the API cut, because alternate storage managers will probably
 have other tuple storage formats --- column stores being the poster
 child here, but in any case the tuple header format is very unlikely
 to be universal.

I've had the same thought.  Unfortunately, there are a lot of things -
like the TupleTableSlot abstraction - that are fairly deeply in bed
with the format used by our current heap AM; so we might be imposing a
performance overhead for anyone who uses some other representation.
Unless you have some idea how to avoid that?

 Which means that whether this patch gets applied to mainline is going
 to be moot for Salesforce's purposes; they will certainly want the
 equivalent logic in their storage code, because they've got tables with
 many hundreds of mostly-null columns, but whether heap_form_tuple acts
 this way or not won't affect them.

OK.

 So unless we consider that many-hundreds-of-columns is a design center
 for general purpose use of Postgres, we should be evaluating this patch
 strictly on its usefulness for more typical table widths.  And my take
 on that is that (1) lots of columns isn't our design center (for the
 reasons you mentioned among others), and (2) the case for the patch
 looks pretty weak otherwise.

I guess I have yet to be convinced that it really hurts anything.
Your example seems fairly hypothetical and could easily be something
no one ever implements.  I don't feel terribly strongly that we need
to take this patch, and I don't really care that much whether we do or
not; I'm just not really convinced there's any actual evidence that we
shouldn't.  The standard isn't that it's got to make something really
important better; it's just got to make something better without
making anything more important worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Noah Misch
On Mon, Jun 24, 2013 at 01:32:41PM -0700, Josh Berkus wrote:
 Do we really have ironclad numbers which show that the patch affects
 performance negatively?  I didn't think the previous performance test
 was comprehensive; I was planning to develop one myself this week.

Not ironclad, no:
http://www.postgresql.org/message-id/20130124021205.gb29...@tornado.leadboat.com

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-01-24 Thread Amit Kapila
On Thursday, January 24, 2013 7:42 AM Noah Misch wrote:
 On Wed, Jan 09, 2013 at 11:22:06AM +, Simon Riggs wrote:
  On 24 December 2012 16:57, Amit Kapila amit.kap...@huawei.com
 wrote:
   Performance: Average of 3 runs of pgbench in tps
   9.3devel  |  with trailing null patch
   --+--
   578.9872  |   573.4980
 
  On balance, it would seem optimizing for this special case would
  affect everybody negatively; not much, but enough. Which means we
  should rekect this patch.
 
  Do you have a reason why a different view might be taken?
 
 We've mostly ignored performance changes of a few percentage points,
 because
 the global consequences of adding or removing code to the binary image
 can
 easily change performance that much.  It would be great to get to the
 point
 where we can reason about 1% performance changes, but we're not there.
 If a
 measured +1% performance change would have yielded a different
 decision, we
 should rethink the decision based on more-robust criteria.

Today, I had taken one more set of readings of original pg_bench which are
below in this mail. The difference is that this set of readings are on Suse
11 and with Shared buffers - 4G
IMO, the changes in this patch are not causing any regression, however it
gives performance/size reduction
in some of the cases as shown by data in previous mails.
So the point to decide is whether the usecases in which it is giving benefit
are worth to have this Patch?
As Tom had already raised some concern's about the code, I think the Patch
can only have merit if the usecase
makes sense to people.

System Configuration: 
Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz)   RAM : 24GB 
Operating System: Suse-Linux 11.2 x86_64   

Sever Configuration: 
The database cluster will be initialized with locales 
  COLLATE:  C 
  CTYPE:C 
  MESSAGES: en_US.UTF-8 
  MONETARY: en_US.UTF-8 
  NUMERIC:  en_US.UTF-8 
  TIME: en_US.UTF-8 

shared_buffers = 4GB 
checkpoint_segments = 255   
checkpoint_timeout = 10min   

pgbench: 
transaction type: TPC-B (sort of) 
scaling factor: 75 
query mode: simple 
number of clients: 4 
number of threads: 4 
duration: 300 s 

   originalpatch   
 Run-1: 311 312 
 Run-2: 307 302 
 Run-3: 300 312 
 Run-4: 285 295 
 Run-5: 308 305 
  
 Avg  : 302.2  305.2


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-01-24 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 That said I don't know just how common that usage pattern is. And I'm
 not sure how many of those people would be able to arrange for the
 null columns to land at the end of the row.

Obviously, they can't arrange that all the time (else their trailing
columns are unnecessary).  It'd have to be something like ordering the
columns in descending probability of being not-null --- and unless the
later columns *all* have very small probability of being not-null,
you're not gonna win much.  Much as I hate to suggest that somebody
consider EAV-like storage, one does have to wonder whether a schema
like that doesn't need redesign more than it needs an implementation
tweak.

 It's a bit frustrating because it does seem like if it had been
 written this way to begin with nobody would every have questioned
 whether it was a good idea and nobody would ever have suggested
 ripping it out.

True, but we'd also have a much higher probability that there aren't
latent bugs because of expectations that t_natts is trustworthy.
It's the cost of getting from here to there that's scaring me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers