Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 06/26/2013 07:05 AM, Jamie Martin wrote: FYI I submitted a slightly modified patch since Amit's measurements that is slightly faster. Yes. My perspective is that this is a worthwhile optimization for a minority, but fairly well-known, use case, provided that it doesn't negatively impact any other, more common use case. Potential cases where I can see negative impact are: A) normal table with a few, mostly non-null columns (recent pgbench testing seems to have validated no measurable impact). B) table with many (400+) mostly non-null columns C) table with many (400+) mostly null columns, where column #390 was null and gets updated to a not null value I don't *think* that Jamie's performance tests have really addressed the above cases. However, do people agree that if performance on the patch passes for all of A, B and C, then it's OK to apply? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
All, So, is this patch currently depending on performance testing, or not? Like I said, it'll be a chunk of time to set up what I beleive is a realistic performance test, so I don't want to do it if the patch is likely to be bounced for other reasons. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Josh Berkus j...@agliodbs.com writes: So, is this patch currently depending on performance testing, or not? Like I said, it'll be a chunk of time to set up what I beleive is a realistic performance test, so I don't want to do it if the patch is likely to be bounced for other reasons. AFAICS, the threshold question here is whether the patch helps usefully for tables with typical numbers of columns (ie, not 800 ;-)), and doesn't hurt materially in any common scenario. If it does, I think we'd take it. I've not read the patch, so I don't know if there are minor cosmetic or correctness issues, but at bottom this seems to be a very straightforward performance tradeoff. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 06/27/2013 11:11 AM, Tom Lane wrote: AFAICS, the threshold question here is whether the patch helps usefully for tables with typical numbers of columns (ie, not 800 ;-)), and I wouldn't expect it to help at all for 50 columns, and probably not measurably below 200. doesn't hurt materially in any common scenario. If it does, I think Yeah, I think that's be bigger question. Ok, I'll start working on a new test case. Here's my thinking on performance tests: 1. run pgbench 10 times both with and without the patch. See if there's any measurable difference. There should not be. 2. Run with/without comparisons for the following scenarios: Each table would have a SERIAL pk, a timestamp, and: Chains of booleans: # attributesNULL probability 16 0% 50% 90% 128 0% 50% 90% 512 0% 50% 90% Chains of text: 16 0% 50% 90% 256 0% 50% 90% ... for 100,000 rows each. Comparisons would include: 1) table size before and after testing 2) time required to read 1000 rows, by key 3) time required to read rows with each of 100 random column positions = some value 4) time required to insert 1000 rows 5) time required to update 1000 rows Geez, I feel tired just thinking about it. Jamie, can your team do any of this? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
FYI I submitted a slightly modified patch since Amit's measurements that is slightly faster. On Jun 25, 2013, at 1:26 PM, Amit Kapila amit.kap...@huawei.com wrote: On Monday, June 24, 2013 8:20 PM Tom Lane wrote: Amit Kapila amit.kap...@huawei.com writes: I will summarize the results, and if most of us feel that they are not good enough, then we can return this patch. Aside from the question of whether there's really any generally-useful performance improvement from this patch, it strikes me that this change forecloses other games that we might want to play with interpretation of the value of a tuple's natts. In particular, when I was visiting Salesforce last week, the point came up that they'd really like ALTER TABLE ADD COLUMN to be free even when the column had a non-null DEFAULT. It's not too difficult to imagine how we might support that, at least when the default expression is a constant: decree that if the requested attribute number is beyond natts, look aside at the tuple descriptor to see if the column is marked as having a magic default value, and if so, substitute that rather than just returning NULL. (This has to be a magic value separate from the current default, else a subsequent DROP or SET DEFAULT would do the wrong thing.) However, this idea conflicts with an optimization that supposes it can reduce natts to suppress null columns: if the column was actually stored as null, you'd lose that information, and would incorrectly return the magic default on subsequent reads. I think it might be possible to make both ideas play together, by not reducing natts further than the last column with a magic default. However, that means extra complexity in heap_form_tuple, which would invalidate the performance measurements done in support of this patch. It can have slight impact on normal scenario's, but I am not sure how much because the change will be very less(may be one extra if check and one assignment) For this Patch's scenario, I think the major benefit for Performance is in heap_fill_tuple() where the For loop is reduced. However added some logic in heap_form_tuple can reduce the performance improvement, but there can still be space saving benefit. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Monday, June 24, 2013 8:20 PM Tom Lane wrote: Amit Kapila amit.kap...@huawei.com writes: I will summarize the results, and if most of us feel that they are not good enough, then we can return this patch. Aside from the question of whether there's really any generally-useful performance improvement from this patch, it strikes me that this change forecloses other games that we might want to play with interpretation of the value of a tuple's natts. In particular, when I was visiting Salesforce last week, the point came up that they'd really like ALTER TABLE ADD COLUMN to be free even when the column had a non-null DEFAULT. It's not too difficult to imagine how we might support that, at least when the default expression is a constant: decree that if the requested attribute number is beyond natts, look aside at the tuple descriptor to see if the column is marked as having a magic default value, and if so, substitute that rather than just returning NULL. (This has to be a magic value separate from the current default, else a subsequent DROP or SET DEFAULT would do the wrong thing.) However, this idea conflicts with an optimization that supposes it can reduce natts to suppress null columns: if the column was actually stored as null, you'd lose that information, and would incorrectly return the magic default on subsequent reads. I think it might be possible to make both ideas play together, by not reducing natts further than the last column with a magic default. However, that means extra complexity in heap_form_tuple, which would invalidate the performance measurements done in support of this patch. It can have slight impact on normal scenario's, but I am not sure how much because the change will be very less(may be one extra if check and one assignment) For this Patch's scenario, I think the major benefit for Performance is in heap_fill_tuple() where the For loop is reduced. However added some logic in heap_form_tuple can reduce the performance improvement, but there can still be space saving benefit. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Monday, June 24, 2013 5:48 PM Andres Freund wrote: On 2013-01-25 09:06:12 +0530, Amit Kapila wrote: On 01/25/2013 03:43 AM, Jameison Martin wrote: there have been a lot of different threads on this patch. i'm going to take a stab at teasing them out, summarizing them, and addressing them individually. Is this patch on the CF app? I can't seem to find it in 2013-01 or 2013-next, and I wanted to add your summary. It is in previous CF (2012-11) in Returned with Feedback section https://commitfest.postgresql.org/action/commitfest_view?id=16 I'd argue that the patch ought to be still marked as Returned with Feedback instead of again being marked as Needs Review. I don't really see anything new that has come up? You are right that nothing new has come up. The major reason why this patch could not go into 9.3 was that it is not clearly evident whether Performance gains of this patch are sufficient to take risks (Tom points out that bugs caused by such changes can be difficult to identify) of committing this code. I will summarize the results, and if most of us feel that they are not good enough, then we can return this patch. Observations from Performance Results for tables with less columns 1. Approximately 13% space savings for table with 12 columns, out of which last 4 are Nulls. This table schema is such first 3 columns are integers and last 9 are boolean's. Refer below link for detailed results: http://www.postgresql.org/message-id/00b401cde1d8$746c90f0$5d45b2d0$@kapila@ huawei.com Observations from Performance Results for tables with large columns 1. There is a visible performance increase when number of columns containing NULLS are more than 60~70% in table have large number of columns. Approximately 12% for table (having 800 cols, out of which 600 are nulls OR having 300 columns, out of which 250 are nulls) 2. There are visible space savings when number of columns containing NULLS are more than 60~70% in table have large number of columns. Approximately 11% for table (having 800 cols, out of which 600 are nulls OR having 300 columns, out of which 250 are nulls) Refer below link for detailed results: http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382853AA4 0@szxeml509-mbs Observation from original pgbench -- 1. There is 1% performance dip for original pgbench, this could be due to fluctuation in readings or some consequences of addition of code which is difficult to reason out. Refer below link for detailed results: http://www.postgresql.org/message-id/00bc01cde1f7$be4b4cb0$3ae1e610$@kapila@ huawei.com With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Amit Kapila amit.kap...@huawei.com writes: I will summarize the results, and if most of us feel that they are not good enough, then we can return this patch. Aside from the question of whether there's really any generally-useful performance improvement from this patch, it strikes me that this change forecloses other games that we might want to play with interpretation of the value of a tuple's natts. In particular, when I was visiting Salesforce last week, the point came up that they'd really like ALTER TABLE ADD COLUMN to be free even when the column had a non-null DEFAULT. It's not too difficult to imagine how we might support that, at least when the default expression is a constant: decree that if the requested attribute number is beyond natts, look aside at the tuple descriptor to see if the column is marked as having a magic default value, and if so, substitute that rather than just returning NULL. (This has to be a magic value separate from the current default, else a subsequent DROP or SET DEFAULT would do the wrong thing.) However, this idea conflicts with an optimization that supposes it can reduce natts to suppress null columns: if the column was actually stored as null, you'd lose that information, and would incorrectly return the magic default on subsequent reads. I think it might be possible to make both ideas play together, by not reducing natts further than the last column with a magic default. However, that means extra complexity in heap_form_tuple, which would invalidate the performance measurements done in support of this patch. In short, there's no free lunch ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 24 June 2013 15:49, Tom Lane t...@sss.pgh.pa.us wrote: Amit Kapila amit.kap...@huawei.com writes: I will summarize the results, and if most of us feel that they are not good enough, then we can return this patch. Aside from the question of whether there's really any generally-useful performance improvement from this patch, it strikes me that this change forecloses other games that we might want to play with interpretation of the value of a tuple's natts. In particular, when I was visiting Salesforce last week, the point came up that they'd really like ALTER TABLE ADD COLUMN to be free even when the column had a non-null DEFAULT. It's not too difficult to imagine how we might support that, at least when the default expression is a constant: decree that if the requested attribute number is beyond natts, look aside at the tuple descriptor to see if the column is marked as having a magic default value, and if so, substitute that rather than just returning NULL. (This has to be a magic value separate from the current default, else a subsequent DROP or SET DEFAULT would do the wrong thing.) Now that is a mighty fine idea. However, this idea conflicts with an optimization that supposes it can reduce natts to suppress null columns: if the column was actually stored as null, you'd lose that information, and would incorrectly return the magic default on subsequent reads. I think it might be possible to make both ideas play together, by not reducing natts further than the last column with a magic default. However, that means extra complexity in heap_form_tuple, which would invalidate the performance measurements done in support of this patch. In short, there's no free lunch ... Agreed. I think its rather a shame that the proponents of this patch have tried so hard to push this through without working variations on the theme. Please guys, go away and get creative; rethink the approach so that you can have your lunch without anybody else losing theirs. I would add that I have seen the use case and want to support it, but we're looking to add to the codebase not just steal small bites of performance from existing use cases. As a practical suggestion, why not change the macro so it doesn't even try to do anything different unless the number of columns is 72. A constant comparison should go very quickly and isolate the code better from the more typical code path. If not, lets try some other ideas, like Tom just did. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Mon, Jun 24, 2013 at 4:05 PM, Simon Riggs si...@2ndquadrant.com wrote: I think its rather a shame that the proponents of this patch have tried so hard to push this through without working variations on the theme. Please guys, go away and get creative; rethink the approach so that you can have your lunch without anybody else losing theirs. I would add that I have seen the use case and want to support it, but we're looking to add to the codebase not just steal small bites of performance from existing use cases. If there's an actual performance consequence of applying this patch, then I think that's a good reason for rejecting it. But if the best argument we can come up with is that we might someday try to do even more clever things with the tuple's natts value, I guess I'm not very impressed. The reason why we have to rewrite the table when someone adds a column with a non-NULL default is because we need to store the new value in every row. Sure, we could defer that in this particular case. But users might be mighty dismayed to see CLUSTER or VACUUM FULL -- or a dump-and-reload! -- cause the table to become much LARGER than it was before. Having some kind of column-oriented compression would be darn nifty, but this particular path doesn't seem particularly promising to me. I think the larger and more fundamental problem with natts is that there's just not very much bit space available there. Even as it is, someone who adds and drops columns with any regularity will eventually hit the wall, and there aren't any good alternatives at that point. Were there more bit space available here, we could even do things like allow some special cases of ALTER TABLE .. SET DATA TYPE not to rewrite the table; natts could become a sort of tuple version number, and we'd remember how to convert to newer versions on the fly. But with only 2048 values available, it's not really feasible to start consuming them for any more operations than we already do. I'm generally a big fan of the principle that no single patch should be allowed to crowd out room for future extensibility in this particular area, but in this case I think the bit space available is *already* so limited that we're not likely to get much further with it without changing the storage format. So, Tom, how's that pluggable storage format coming? :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Simon, I think its rather a shame that the proponents of this patch have tried so hard to push this through without working variations on the theme. Please guys, go away and get creative; rethink the approach so that you can have your lunch without anybody else losing theirs. I would add that I have seen the use case and want to support it, but we're looking to add to the codebase not just steal small bites of performance from existing use cases. Do we really have ironclad numbers which show that the patch affects performance negatively? I didn't think the previous performance test was comprehensive; I was planning to develop one myself this week. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Robert Haas robertmh...@gmail.com writes: If there's an actual performance consequence of applying this patch, then I think that's a good reason for rejecting it. But if the best argument we can come up with is that we might someday try to do even more clever things with the tuple's natts value, I guess I'm not very impressed. The reason why we have to rewrite the table when someone adds a column with a non-NULL default is because we need to store the new value in every row. Sure, we could defer that in this particular case. But users might be mighty dismayed to see CLUSTER or VACUUM FULL -- or a dump-and-reload! -- cause the table to become much LARGER than it was before. Having some kind of column-oriented compression would be darn nifty, but this particular path doesn't seem particularly promising to me. The point of what I was suggesting isn't to conserve storage, but to reduce downtime during a schema change. Remember that a rewriting ALTER TABLE locks everyone out of that table for a long time. So, Tom, how's that pluggable storage format coming? :-) Well, actually, it's looking to me like heap_form_tuple will be underneath the API cut, because alternate storage managers will probably have other tuple storage formats --- column stores being the poster child here, but in any case the tuple header format is very unlikely to be universal. Which means that whether this patch gets applied to mainline is going to be moot for Salesforce's purposes; they will certainly want the equivalent logic in their storage code, because they've got tables with many hundreds of mostly-null columns, but whether heap_form_tuple acts this way or not won't affect them. So unless we consider that many-hundreds-of-columns is a design center for general purpose use of Postgres, we should be evaluating this patch strictly on its usefulness for more typical table widths. And my take on that is that (1) lots of columns isn't our design center (for the reasons you mentioned among others), and (2) the case for the patch looks pretty weak otherwise. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 24 June 2013 21:50, Tom Lane t...@sss.pgh.pa.us wrote: So, Tom, how's that pluggable storage format coming? :-) Well, actually, it's looking to me like heap_form_tuple will be underneath the API cut, because alternate storage managers will probably have other tuple storage formats --- column stores being the poster child here, but in any case the tuple header format is very unlikely to be universal. Hopefully we would allow multiple storage managers to be active at once, one per table? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
i believe the last submission of the patch had no negative performance impact on any of the tested use cases, but you'd have to go back and see the last exchange on that. i think it was the concern about potentially regressing the codeline in unforeseen ways without a clear benefit to all but one use case (wide tables) that stalled things. From: Josh Berkus j...@agliodbs.com To: Simon Riggs si...@2ndquadrant.com Cc: Tom Lane t...@sss.pgh.pa.us; Amit Kapila amit.kap...@huawei.com; Andres Freund and...@2ndquadrant.com; Craig Ringer cr...@2ndquadrant.com; Jameison Martin jameis...@yahoo.com; Noah Misch n...@leadboat.com; Kevin Grittner kgri...@mail.com; robertmh...@gmail.com; pgsql-hackers@postgresql.org Sent: Monday, June 24, 2013 10:32 PM Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review] Simon, I think its rather a shame that the proponents of this patch have tried so hard to push this through without working variations on the theme. Please guys, go away and get creative; rethink the approach so that you can have your lunch without anybody else losing theirs. I would add that I have seen the use case and want to support it, but we're looking to add to the codebase not just steal small bites of performance from existing use cases. Do we really have ironclad numbers which show that the patch affects performance negatively? I didn't think the previous performance test was comprehensive; I was planning to develop one myself this week. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 06/24/2013 01:50 PM, Tom Lane wrote: The point of what I was suggesting isn't to conserve storage, but to reduce downtime during a schema change. Remember that a rewriting ALTER TABLE locks everyone out of that table for a long time. Right, but I'm worried about the surprise! factor. That is, if we make the first default free by using a magic value, then a SET DEFAULT on that column is going to have some very surprising results as suddenly the whole table needs to get written out for the old default. In many use cases, this would still be a net win, since 80% of the time users don't change defaults after column creation. But we'd have to make it much less surprising somehow. Also for the reason Tom pointed out, the optimization would only work on with NOT NULL columns ... leading to another potential unexpected surprise when the column is marked NULLable. So unless we consider that many-hundreds-of-columns is a design center for general purpose use of Postgres, we should be evaluating this patch strictly on its usefulness for more typical table widths. And my take on that is that (1) lots of columns isn't our design center (for the reasons you mentioned among others), and (2) the case for the patch looks pretty weak otherwise. Well, actually, hundreds of columns is reasonably common for a certain user set (ERP, CRM, etc.). If we could handle that use case very efficiently, then it would win us some users, since other RDMBSes don't. However, there are multiple issues with having hundreds of columns, of which storage optimization is only one ... and probably the smallest one at that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Josh Berkus j...@agliodbs.com writes: On 06/24/2013 01:50 PM, Tom Lane wrote: The point of what I was suggesting isn't to conserve storage, but to reduce downtime during a schema change. Remember that a rewriting ALTER TABLE locks everyone out of that table for a long time. Right, but I'm worried about the surprise! factor. That is, if we make the first default free by using a magic value, then a SET DEFAULT on that column is going to have some very surprising results as suddenly the whole table needs to get written out for the old default. No, that's why we'd store the magic default separately. That will be permanent and unaffected by later SET DEFAULT operations. (This requires that every subsequently created tuple store the column explicitly so that the magic default doesn't affect it; which is exactly why there's a conflict with the currently-proposed patch.) ... Also for the reason Tom pointed out, the optimization would only work on with NOT NULL columns ... leading to another potential unexpected surprise when the column is marked NULLable. Huh? We already have the case of null default handled. Well, actually, hundreds of columns is reasonably common for a certain user set (ERP, CRM, etc.). If we could handle that use case very efficiently, then it would win us some users, since other RDMBSes don't. However, there are multiple issues with having hundreds of columns, of which storage optimization is only one ... and probably the smallest one at that. Agreed; there are a lot of things we'd have to address if we really wanted to claim this is a domain we work well in. (I suspect Salesforce will be chipping away at some of those issues, but as I said, heap_form_tuple is not in their critical path.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On 06/24/2013 02:21 PM, Tom Lane wrote: Right, but I'm worried about the surprise! factor. That is, if we make the first default free by using a magic value, then a SET DEFAULT on that column is going to have some very surprising results as suddenly the whole table needs to get written out for the old default. No, that's why we'd store the magic default separately. That will be permanent and unaffected by later SET DEFAULT operations. (This requires that every subsequently created tuple store the column explicitly so that the magic default doesn't affect it; which is exactly why there's a conflict with the currently-proposed patch.) Yah. So how likely is this to get done sometime in the next 2 releases? It's only a conflict if someone is going to write the code ... Agreed; there are a lot of things we'd have to address if we really wanted to claim this is a domain we work well in. (I suspect Salesforce will be chipping away at some of those issues, but as I said, heap_form_tuple is not in their critical path.) Well, doing the trailing column truncation as part of a general plan to make having 600 columns less painful would make more sense than doing it on its own. For my personal use case(s), I don't really care about the null bitmap that much; that amount of space simply isn't that significant compared to the other performance issues involved. I started evaluating this patch just because I'm one of the few people with a realistic test case. Anyway, let's decide if acceptance of this patch hinges on performance or not. I'll take me a whole evening to set up a good performance test, so I don't want to do it if it's going to be a moot point. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Mon, Jun 24, 2013 at 4:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: The point of what I was suggesting isn't to conserve storage, but to reduce downtime during a schema change. Remember that a rewriting ALTER TABLE locks everyone out of that table for a long time. Sure, I understand. As Josh says, though, it'd still be potentially quite surprising. So, Tom, how's that pluggable storage format coming? :-) Well, actually, it's looking to me like heap_form_tuple will be underneath the API cut, because alternate storage managers will probably have other tuple storage formats --- column stores being the poster child here, but in any case the tuple header format is very unlikely to be universal. I've had the same thought. Unfortunately, there are a lot of things - like the TupleTableSlot abstraction - that are fairly deeply in bed with the format used by our current heap AM; so we might be imposing a performance overhead for anyone who uses some other representation. Unless you have some idea how to avoid that? Which means that whether this patch gets applied to mainline is going to be moot for Salesforce's purposes; they will certainly want the equivalent logic in their storage code, because they've got tables with many hundreds of mostly-null columns, but whether heap_form_tuple acts this way or not won't affect them. OK. So unless we consider that many-hundreds-of-columns is a design center for general purpose use of Postgres, we should be evaluating this patch strictly on its usefulness for more typical table widths. And my take on that is that (1) lots of columns isn't our design center (for the reasons you mentioned among others), and (2) the case for the patch looks pretty weak otherwise. I guess I have yet to be convinced that it really hurts anything. Your example seems fairly hypothetical and could easily be something no one ever implements. I don't feel terribly strongly that we need to take this patch, and I don't really care that much whether we do or not; I'm just not really convinced there's any actual evidence that we shouldn't. The standard isn't that it's got to make something really important better; it's just got to make something better without making anything more important worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Mon, Jun 24, 2013 at 01:32:41PM -0700, Josh Berkus wrote: Do we really have ironclad numbers which show that the patch affects performance negatively? I didn't think the previous performance test was comprehensive; I was planning to develop one myself this week. Not ironclad, no: http://www.postgresql.org/message-id/20130124021205.gb29...@tornado.leadboat.com -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
On Thursday, January 24, 2013 7:42 AM Noah Misch wrote: On Wed, Jan 09, 2013 at 11:22:06AM +, Simon Riggs wrote: On 24 December 2012 16:57, Amit Kapila amit.kap...@huawei.com wrote: Performance: Average of 3 runs of pgbench in tps 9.3devel | with trailing null patch --+-- 578.9872 | 573.4980 On balance, it would seem optimizing for this special case would affect everybody negatively; not much, but enough. Which means we should rekect this patch. Do you have a reason why a different view might be taken? We've mostly ignored performance changes of a few percentage points, because the global consequences of adding or removing code to the binary image can easily change performance that much. It would be great to get to the point where we can reason about 1% performance changes, but we're not there. If a measured +1% performance change would have yielded a different decision, we should rethink the decision based on more-robust criteria. Today, I had taken one more set of readings of original pg_bench which are below in this mail. The difference is that this set of readings are on Suse 11 and with Shared buffers - 4G IMO, the changes in this patch are not causing any regression, however it gives performance/size reduction in some of the cases as shown by data in previous mails. So the point to decide is whether the usecases in which it is giving benefit are worth to have this Patch? As Tom had already raised some concern's about the code, I think the Patch can only have merit if the usecase makes sense to people. System Configuration: Hardware : 4 core (Intel(R) Xeon(R) CPU L5408 @ 2.13GHz) RAM : 24GB Operating System: Suse-Linux 11.2 x86_64 Sever Configuration: The database cluster will be initialized with locales COLLATE: C CTYPE:C MESSAGES: en_US.UTF-8 MONETARY: en_US.UTF-8 NUMERIC: en_US.UTF-8 TIME: en_US.UTF-8 shared_buffers = 4GB checkpoint_segments = 255 checkpoint_timeout = 10min pgbench: transaction type: TPC-B (sort of) scaling factor: 75 query mode: simple number of clients: 4 number of threads: 4 duration: 300 s originalpatch Run-1: 311 312 Run-2: 307 302 Run-3: 300 312 Run-4: 285 295 Run-5: 308 305 Avg : 302.2 305.2 With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]
Greg Stark st...@mit.edu writes: That said I don't know just how common that usage pattern is. And I'm not sure how many of those people would be able to arrange for the null columns to land at the end of the row. Obviously, they can't arrange that all the time (else their trailing columns are unnecessary). It'd have to be something like ordering the columns in descending probability of being not-null --- and unless the later columns *all* have very small probability of being not-null, you're not gonna win much. Much as I hate to suggest that somebody consider EAV-like storage, one does have to wonder whether a schema like that doesn't need redesign more than it needs an implementation tweak. It's a bit frustrating because it does seem like if it had been written this way to begin with nobody would every have questioned whether it was a good idea and nobody would ever have suggested ripping it out. True, but we'd also have a much higher probability that there aren't latent bugs because of expectations that t_natts is trustworthy. It's the cost of getting from here to there that's scaring me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers