Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-06-24 Thread Jameison Martin
i believe the last submission of the patch had no negative performance impact 
on any of the tested use cases, but you'd have to go back and see the last 
exchange on that. 

i think it was the concern about potentially regressing the codeline in 
unforeseen ways without a clear benefit to all but one use case (wide tables) 
that stalled things.



>
> From: Josh Berkus 
>To: Simon Riggs  
>Cc: Tom Lane ; Amit Kapila ; 
>Andres Freund ; Craig Ringer ; 
>Jameison Martin ; Noah Misch ; Kevin 
>Grittner ; robertmh...@gmail.com; 
>pgsql-hackers@postgresql.org 
>Sent: Monday, June 24, 2013 10:32 PM
>Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap 
>rows to reduce the size of the null bitmap [Review]
> 
>
>Simon,
>
>> I think its rather a shame that the proponents of this patch have
>> tried so hard to push this through without working variations on the
>> theme. Please guys, go away and get creative; rethink the approach so
>> that you can have your lunch without anybody else losing theirs. I
>> would add that I have seen the use case and want to support it, but
>> we're looking to add to the codebase not just steal small bites of
>> performance from existing use cases.
>
>Do we really have ironclad numbers which show that the patch affects
>performance negatively?  I didn't think the previous performance test
>was comprehensive; I was planning to develop one myself this week.
>
>-- 
>Josh Berkus
>PostgreSQL Experts Inc.
>http://pgexperts.com
>
>
>

Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]

2013-01-22 Thread Jameison Martin
Sorry for the late response, I just happened to see this yesterday.

Running a general benchmark against the patch as Keven suggests is a good idea. 

Amit, can you supply the actual values you saw when running pgbench (the 3 
values for each run)? I'd like to verify that the 1% difference isn't due to 
some file system/OS variability (would be interested in what the stdev is for 
the values). Also, do you happen to have some information about the hardware 
you ran on?

Meanwhile, I'll rerun on my end to see if I can reproduce your numbers. And 
I'll get a run of pgbench with minimal I/O since that can induce a lot of 
variability.

Also, Amit, I want to thank you for all your work on this patch, it is greatly 
appreciated.

Thanks

-Jamie



On Monday, December 24, 2012 7:58 PM Kevin Grittner wrote: 
Simon Riggs wrote: 
>Not really sure about the 100s of columns use case. But showing gain in useful 
>places in these more common cases wins
my vote. Thanks for testing. Barring objections, will commit. 
>Do we have any results on just a plain, old stock pgbench run, with
the default table definitions? That would be a reassuring complement to the 
other tests. 
Sever Configuration: 
The database cluster will be initialized with locales  COLLATE:  C  CTYPE:C 
 MESSAGES: en_US.UTF-8  MONETARY: en_US.UTF-8  NUMERIC:  en_US.UTF-8  TIME: 
en_US.UTF-8  shared_buffers = 1GB 
checkpoint_segments = 255   
checkpoint_timeout = 15minpgbench: 
transaction type: TPC-B (sort of) 
scaling factor: 75 
query mode: simple 
number of clients: 8 
number of threads: 8 
duration: 600 s  Performance: Average of 3 runs of pgbench in tps 
9.3devel  |  with trailing null patch 
--+-- 
578.9872  |   573.4980 With Regards,
Amit Kapila. 

Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-08-09 Thread Jameison Martin
Simon, Tom is correct, the patch doesn't change the existing row format 
contract or the format of the null bitmap. The change only affects how new rows 
are written out. And it uses the same supported format that has always been 
there (which is why alter table add col null works the way it does). And it 
keeps to the same MAXALIGN boundaries that are there today. 

One could argue that different row formats could make sense in different 
circumstances, and I'm certainly open to that kind of discussion, but this 
change is far more modest and perhaps can be made on its own since it 
doesn't perturb the code base much, improves performance (marginally) and 
improves the size of rows with lots of trailing nulls.

[separate topic: pluggable heap manager]
I'm quite interested in pursuing more aggressive compression strategies, and 
I'd like to do so in the context of the heap manager. I'm exploring having a 
pluggable heap manager implementation and would be interested in feedback on 
that as a general approach. My thinking is that I'd like to be able to have 
PostgreSQL support multiple heap implementations along the lines of how 
multiple index types are supported, though probably only the existing heap 
manager implementation would be part of the actual codeline. I've done a little 
exploratory work of looking at the heap interface. I was planning on doing a 
little prototyping before suggesting anything concrete, but, assuming the 
concept of a layered heap manager is not inherently objectionable, I was 
thinking of cleaning up the heap interface a little (e.g. some HOT stuff has 
bled across a little), then taking a whack at formalizing the interface along 
the lines of the index layering. So ideally I'd make a
 few separate submissions and if all goes according to plan I'd be able to have 
a pluggable heap manager implementation that I could work on independently and 
which could in theory use the same hooks as the existing heap implementation. 
And if it turns out that my implementation is deemed to be general enough it 
could be released to the community.

If I do decide to pursue this, can anyone suggest the best way solicit 
feedback? I see that some proposals get shared on the postgres wiki. I could 
put something up there to frame the issue and encourage some back and forth 
dialog. Or is email the way that this kind of exchange tends to happen? 
Ultimately I'd like to get into a bit of detail about what the actual heap 
manager contract is and so forth.

Note that I'm a ways from really knowing if this is feasible on my end, so this 
is quite speculative at this point. But I'd like to introduce the topic and get 
some feedback on the right way to communicate as early as possible.

Thanks.

-Jamie


________
 From: Tom Lane 
To: Simon Riggs  
Cc: Jameison Martin ; "pgsql-hackers@postgresql.org" 
 
Sent: Thursday, August 9, 2012 7:27 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows 
to reduce the size of the null bitmap
 
Simon Riggs  writes:
> On 17 April 2012 17:22, Jameison Martin  wrote:
>> The following patch truncates trailing null attributes from heap rows to
>> reduce the size of the row bitmap.

> This is an interesting patch, but its has had various comments made about it.

> When I look at this I see that it would change the NULL bitmap for all
> existing rows, which means it forces a complete unload/reload of data.

Huh?  I thought it would only change how *new* tuples were stored.
Old tuples ought to continue to work fine.

I'm not really convinced that it's a good idea in the larger scheme
of things --- your point in a nearby thread that micro-optimizing
storage space at the expense of all else is not good engineering
applies here.  But I don't see that it forces data reload.  Or if
it does, that should be easily fixable.

> ...  Have another flag which indicates
> when a partial trailing col trimmed NULL bitmap is in use.

That might be useful for forensic purposes, but on the whole I suspect
it's just added complexity (and eating up a valuable infomask bit)
for relatively little gain.

> ... decide whether a table will benefit from full or partial bitmap and
> set that in the tupledesc. That way the tupledesc will show
> heap_form_tuple which kind of null bitmap is preferred for new tuples.
> That preference might be settable by user on or off, but the default
> would be for postgres to decide that for us based upon null stats etc,
> which we would decide at ANALYZE time.

And that seems like huge overcomplication.  I think we could probably
do fine with some very simple fixed policy, like "don't bother with
this for tables of less than N columns", where N is maybe 64 or so
and chosen to match the MAXALIGN boundary where there actually could
be some savings from trimming the null bitmap.

(Note: I've not read the patch, so maybe Jameison already did something
of the sort.)

            regards, tom lane

Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-26 Thread Jameison Martin
Simon and Greg,

The math on space savings is assuming that columns will be used roughly from 
first to last as declared in the DDL, not a random distribution of column 
values. This is the case for the particular schema that I'm looking at. I'm not 
asserting that it is the common case in general, though it may be more common 
than not given the fact that several commercial databases optimize for trailing 
null column values and developers often pay attention to this.

If there is a exact standard as to how this group does performance analysis 
(e.g. removing outliers beyond a certain standard deviation, number 
of repetitions, machine isolation requirements and so forth), please let me 
know. I can submit my results as is but in the interest of avoiding a lot of 
duplicate postings perhaps someone can point me to an example of what kinds of 
numbers are desired so I can make sure my posting conforms to that. For what it 
is worth I ran the 3 tests 10 times each and removed the outliers, but I can 
run 100 times or do something different if need be (e.g. post a csv for easy 
consumption in a spreadsheet). Also, Simon, you mentioned posting "environment 
notes", can you let me know what kind of environment notes are desired? For 
example, are you thinking about changes to the vanilla postgresql.conf, 
hardware information, OS config, etc?

Greg, all I'm trying to establish is that this change doesn't hurt insert 
performance for the common case as per Tom's comments. I'll try to add some 
additional test cases with varying trailing null column values to see if we can 
establish the potential salutary effect with a bit more data, but I'm not 
actually asserting that this significant or is a justification for the patch. 
It would be interesting to see what the performance benefit is with real 
queries against rows that have much smaller bitmaps, but I'd prefer not to get 
into that.

As for proof of the size reduction, I'd actually like to codify something in a 
regression test to ensure there are no regressions in the behavior of the 
patch. I was a little leery of creating a regression test that is dependent on 
internals that might cause the test to break over time, so I punted on it. Does 
anyone have a good suggestion as to a safe way to codify that the proposed 
behavioral change is working as intended in the form of a test that is unlikely 
to break over time? The best thing I could come up with was to create a very 
wide table and insert some sparse rows (trailing nulls) and verify that the 
pages. In any event, I'll also post a comparative relation size number and test 
as well.

Cheers.

-Jamie


____
 From: Simon Riggs 
To: Jameison Martin  
Cc: Tom Lane ; "pgsql-hackers@postgresql.org" 
 
Sent: Thursday, April 26, 2012 12:27 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows 
to reduce the size of the null bitmap
 
On Thu, Apr 26, 2012 at 1:35 AM, Jameison Martin  wrote:
> Tom, I whipped up some  INSERT/SELECT tests where I selected into a
> temporary table as you suggested. The target temporary table and the source
> table were in cache and I basically disabled things that would cause noise.
> The source table had 5 integer columns, and was populated with 10 million
> rows.
>
> I tried 3 variations:
>   1) target has all nullable columns, all set to non null values: the
> results were the same
>   2) target has all nullable columns, only the first column is set: the
> patch was slightly faster
>   3) target has all non-null columns: the patch maybe was slightly faster,
> probably not statistically relevant
>
> By slightly faster I'm talking on order of 10 nanoseconds per row.
>
> I think #2 is explained by the reduction in loop iterations in
> heap_fill_tuple().

I see this as a useful use case that I have come across in a few
cases, most typically associated with very large databases.

It will be a win in those cases, but I think your maths is unrealistic
for the common case. In your case, you're saying that you have 750
trailing null columns that will be all-NULL in 90% of cases. Given a
randomly distributed set of col values, I'd expect the last NULL to be
on average around the 400th column, perhaps more. So the savings are
still high, but not as high in the general case as it is for you.

The performance tests Tom asks for are essential, otherwise we cannot
proceed. Thanks for starting those.

Please post your test code, any environment notes and your exact test
results. The important point is that we need objectively confirmable
tests, not just your word it was faster. Everybody is held to the same
level of proof here, so its not a personal doubt.

It would be useful to post sizes of databases also, to confirm that
the patch really does reduce database size.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-25 Thread Jameison Martin
Tom, I whipped up some  INSERT/SELECT tests where I selected into a temporary 
table as you suggested. The target temporary table and the source table were in 
cache and I basically disabled things that would cause noise. The source table 
had 5 integer columns, and was populated with 10 million rows.

I tried 3 variations:
  1) target has all nullable columns, all set to non null values: the results 
were the same
  2) target has all nullable columns, only the first column is set: the patch 
was slightly faster
  3) target has all non-null columns: the patch maybe was slightly faster, 
probably not statistically relevant


By slightly faster I'm talking on order of 10 nanoseconds per row.

I think #2 is explained by the reduction in loop iterations in 
heap_fill_tuple(). 



 From: Tom Lane 
To: Jameison Martin  
Cc: "pgsql-hackers@postgresql.org"  
Sent: Tuesday, April 17, 2012 9:57 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows 
to reduce the size of the null bitmap 
 
Jameison Martin  writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 
> columns, most of which have only the first 50 or so values set. 800 columns 
> would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte 
> alignment the row bitmap would take up 104 bytes with the current 
> implementation. If only the first 50 or so columns are actually non-null, 
> then the minimum bitmap size wouldn't need to be more than 8 bytes, which 
> means the proposed change would save 96 bytes. For the data set I have in 
> mind roughly 90% of the rows would fall into the category of needing only 8 
> bytes for the null bitmap.

I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones.  I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?

> What kind of test results would prove that this is a net win (or not a net 
> loss) for typical cases? Are you interested in some insert performance tests? 
> Also, how would you define a typical case (e.g. what kind of data shape)?

Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero.  Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most.  Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.

            regards, tom lane

Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Jameison Martin
Regarding the schema: I'm afraid the schema cannot be changed at this point, 
though I appreciate
the suggestions. 

Regarding an INSERT performance test, what kind of table shape would you like 
me to exercise? 
The patch as submitted may actually shave some cycles off of the insertion of 
rows with trailing nulls even 
when there are less than 64 columns because it avoids iterating over the null 
columns a 2nd time in heap_fill_tuple(), 
so I want to be sure that I pick something that you feel is properly 
representative. 

Thanks.

-Jamie



 From: Tom Lane 
To: Jameison Martin  
Cc: "pgsql-hackers@postgresql.org"  
Sent: Tuesday, April 17, 2012 9:57 PM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows 
to reduce the size of the null bitmap 
 
Jameison Martin  writes:
> The use-case I'm targeting is a schema that has multiple tables with ~800 
> columns, most of which have only the first 50 or so values set. 800 columns 
> would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte 
> alignment the row bitmap would take up 104 bytes with the current 
> implementation. If only the first 50 or so columns are actually non-null, 
> then the minimum bitmap size wouldn't need to be more than 8 bytes, which 
> means the proposed change would save 96 bytes. For the data set I have in 
> mind roughly 90% of the rows would fall into the category of needing only 8 
> bytes for the null bitmap.

I can't help thinking that (a) this is an incredibly narrow use-case,
and (b) you'd be well advised to rethink your schema design anyway.
There are a whole lot of inefficiencies associated with having that many
columns; the size of the null bitmap is probably one of the smaller
ones.  I don't really want to suggest an EAV design, but perhaps some of
the columns could be collapsed into arrays, or something like that?

> What kind of test results would prove that this is a net win (or not a net 
> loss) for typical cases? Are you interested in some insert performance tests? 
> Also, how would you define a typical case (e.g. what kind of data shape)?

Hmm, well, most of the tables I've seen have fewer than 64 columns, so
that the probability of win is exactly zero.  Which would mean that
you've got to demonstrate that the added overhead is unmeasurably small.
Which maybe you can do, because there's certainly plenty of cycles
involved in a tuple insertion, but we need to see the numbers.
I'd suggest an INSERT/SELECT into a temp table as probably stressing
tuple formation speed the most.  Or maybe you could write a C function
that just exercises heap_form_tuple followed by heap_freetuple in a
tight loop --- if there's no slowdown measurable in that context, then
a fortiori we don't have to worry about it in the real world.

            regards, tom lane

Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Jameison Martin
Thanks for the response.



The use-case I'm targeting is a schema that has multiple tables with ~800 
columns, most of which have only the first 50 or so values set. 800 columns 
would require 800 bits in a bitmap which equates to 100 bytes. With 8-byte 
alignment the row bitmap would take up 104 bytes with the current 
implementation. If only the first 50 or so columns are actually non-null, then 
the minimum bitmap size wouldn't need to be more than 8 bytes, which means the 
proposed change would save 96 bytes. For the data set I have in mind roughly 
90% of the rows would fall into the category of needing only 8 bytes for the 
null bitmap.


What kind of test results would prove that this is a net win (or not a net 
loss) for typical cases? Are you interested in some insert performance tests? 
Also, how would you define a typical case (e.g. what kind of data shape)?

Thanks.
-jamie



 From: Tom Lane 
To: Jameison Martin  
Cc: "pgsql-hackers@postgresql.org"  
Sent: Tuesday, April 17, 2012 9:38 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows 
to reduce the size of the null bitmap 
 
Jameison Martin  writes:
> The following patch truncates trailing null attributes from heap rows to 
> reduce the size of the row bitmap. 

This has been discussed before, but it always seemed that the
cost-benefit ratio was exceedingly questionable.  You don't get any
savings whatsoever unless you reduce the size of the null bitmap across
a MAXALIGN boundary, which more and more often is 64 bits, so that the
frequency with which the optimization wins anything doesn't look likely
to be that high.  And on the other side of the coin, you're adding
cycles to every single tuple-construction operation to make this work.
The introduction of bugs doesn't seem improbable either.  (Just because
tuples in user tables might have unexpected natts values doesn't mean
that the code is, or should be, prepared to
 tolerate that in system
tables or plan-constructed tuples.)

So what I'd like to see is some concrete test results proving that this
is a net win, or at least not a net loss, for typical cases.  Just
asserting that it might be a win for certain usage patterns doesn't do
it for me.

            regards, tom lane

[HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-17 Thread Jameison Martin
The following patch truncates trailing null attributes from heap rows to reduce 
the size of the row bitmap. 


Applications often have wide rows in which many of the trailing column values 
are null. On an insert/update, all of the trailing null columns are tracked in 
the row bitmap. This can add a substantial overhead for very wide rows. This 
change truncates heap rows such that the trailing nulls are elided. 

The intuition for this change is that ALTER TABLE t ADD COLUMN c type NULL is a 
metadata only change. Postgres works fine when a row's metadata (tuple 
descriptor) is inconsistent with the actual row data: extra columns are assumed 
to be null. This change just adjusts the number of attributes for a row and the 
row bitmap to only track up to the last non-null attribute.

Thanks.

-Jamie Martin

0001-Truncate-trailing-null-attributes-from-heap-rows-to-.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers