Re: [PATCHES] COPY Fillfactor patch
On Wed, 2005-04-20 at 09:42 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Wed, 2005-04-20 at 09:03 -0400, Bruce Momjian wrote: > >> My guess is that the FILLFACTOR will actually be a GUC variable rather > >> than a clause to CREATE INDEX or REINDEX or CLUSTER. > > > I hope not, but it would be easier to set like that and a lot easier to > > code up that way. > > I don't like that either. It seems highly unlikely that every table in > a database would need the same fillfactor. It should be a per-table > attribute, maybe with a DB-wide default, much like we handle per-column > statistics targets. > > My thought about this is that the case where extra free space really > helps is UPDATE: if heap_update can put the new tuple version on the > same page where the old one is, there's a noticeable savings. So what > you'd probably want is that heap_insert respects fillfactor (ie, won't > eat the last fillfactor percent of a page) but heap_update doesn't, > if it can thereby put the update on the same page. As long as you > vacuum before a particular page is entirely filled by updates of its > own tuples, you have a steady-state situation. Agreed. Thats the best approach. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] COPY Fillfactor patch
Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > On Wed, 2005-04-20 at 09:03 -0400, Bruce Momjian wrote: > >> My guess is that the FILLFACTOR will actually be a GUC variable rather > >> than a clause to CREATE INDEX or REINDEX or CLUSTER. > > > I hope not, but it would be easier to set like that and a lot easier to > > code up that way. > > I don't like that either. It seems highly unlikely that every table in > a database would need the same fillfactor. It should be a per-table > attribute, maybe with a DB-wide default, much like we handle per-column > statistics targets. That is fine. I just don't want to add the FILLFACTOR clause to several commands. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] COPY Fillfactor patch
Simon Riggs <[EMAIL PROTECTED]> writes: > On Wed, 2005-04-20 at 09:03 -0400, Bruce Momjian wrote: >> My guess is that the FILLFACTOR will actually be a GUC variable rather >> than a clause to CREATE INDEX or REINDEX or CLUSTER. > I hope not, but it would be easier to set like that and a lot easier to > code up that way. I don't like that either. It seems highly unlikely that every table in a database would need the same fillfactor. It should be a per-table attribute, maybe with a DB-wide default, much like we handle per-column statistics targets. My thought about this is that the case where extra free space really helps is UPDATE: if heap_update can put the new tuple version on the same page where the old one is, there's a noticeable savings. So what you'd probably want is that heap_insert respects fillfactor (ie, won't eat the last fillfactor percent of a page) but heap_update doesn't, if it can thereby put the update on the same page. As long as you vacuum before a particular page is entirely filled by updates of its own tuples, you have a steady-state situation. I don't see a lot of point in Hannu's suggestion about leaving every Nth page free. Once you've had to move off the source page, near or far makes little difference. It might win if we were also trying to teach insert and update to preserve CLUSTER ordering --- but that is an order of magnitude harder than what's been discussed in this thread, and I'm unconvinced it's a good idea anyway ... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] COPY Fillfactor patch
On Wed, 2005-04-20 at 09:03 -0400, Bruce Momjian wrote: > My guess is that the FILLFACTOR will actually be a GUC variable rather > than a clause to CREATE INDEX or REINDEX or CLUSTER. I hope not, but it would be easier to set like that and a lot easier to code up that way. Trouble is, if you need it, it needs to be varied according to what is happening to a particular relation. Otherwise you'd gain benefit in one area and lose it in another. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] COPY Fillfactor patch
On Wed, 2005-04-20 at 02:08 +0300, Hannu Krosing wrote: > > Anyway, when I get time, I'll finish off the patch. Unless other readers > > would like to follow on. > > I hope you will you will get that time before 8.1 ships :) It's very unlikely, sorry. I'm working towards improving Business Intelligence workloads now and this patch isn't in that direction. BI likes full blocks (and no spaces). The sparse heap idea would use resources better than a smooth spread and is probably just as easy to code. The main problem is the FSM is only currently filled at VACUUM time, so putting spaces in tables wouldn't help much until after the first vacuum of a table. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] COPY Fillfactor patch
My guess is that the FILLFACTOR will actually be a GUC variable rather than a clause to CREATE INDEX or REINDEX or CLUSTER. --- Hannu Krosing wrote: > On T, 2005-04-12 at 18:38 +0100, Simon Riggs wrote: > > On Tue, 2005-04-12 at 09:56 -0400, Tom Lane wrote: > > (Neil, I added you to CC: to show you at least two more places where > sparse heaps can be generally useful and not tweaks for single > benchmark) > > > > Simon Riggs <[EMAIL PROTECTED]> writes: > > > > During recent tuning of the TPC-C workload, I produced the following > > > > patch to force COPY to leave some space in each data block when it loads > > > > data into heap relations. > > When I comtemplated a similar idea some time ago, my primary goal was > reducing disk head movement during massive updates. > > At that time it seemed to me cheaper to not leave space in each page, > but to leave each Nth page empty, as more new tuples will be punt on the > same page and thus cause less WAL writes. > > Warning: This may be a false assumption - I did not check from code, if > this is actually so for any or even a significant number of cases. > > > > Put the info into the Relation > > > structure instead of cluttering heap_insert's API. (This would mean > > > that *every* heap_insert is affected, not only COPY, which is what you > > > want anyway I would think.) > > > > Well, I intended it to work *only* for copy, not for insert, which is > > why I did it that way. > > To be more generally useful similar thing should be added to VACUUM FULL > and CLUSTER. > > And perhaps some weird LAZY version of "VACUUM EXPAND" could be written > as well, for keeping the holes from filling up in constantly growing > databases. > > Having these holes is also essential, if we want a cheap way to keep > data in CLUSTER order after initial CLUSTER command - if we do have free > space everywhere in the file, we can just put each new tuple on the > first page with free space on or after its preceeding tuple in cluster > index. > > > Anyway, when I get time, I'll finish off the patch. Unless other readers > > would like to follow on. > > I hope you will you will get that time before 8.1 ships :) > > -- > Hannu Krosing <[EMAIL PROTECTED]> > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] COPY Fillfactor patch
On T, 2005-04-12 at 18:38 +0100, Simon Riggs wrote: > On Tue, 2005-04-12 at 09:56 -0400, Tom Lane wrote: (Neil, I added you to CC: to show you at least two more places where sparse heaps can be generally useful and not tweaks for single benchmark) > > Simon Riggs <[EMAIL PROTECTED]> writes: > > > During recent tuning of the TPC-C workload, I produced the following > > > patch to force COPY to leave some space in each data block when it loads > > > data into heap relations. When I comtemplated a similar idea some time ago, my primary goal was reducing disk head movement during massive updates. At that time it seemed to me cheaper to not leave space in each page, but to leave each Nth page empty, as more new tuples will be punt on the same page and thus cause less WAL writes. Warning: This may be a false assumption - I did not check from code, if this is actually so for any or even a significant number of cases. > > Put the info into the Relation > > structure instead of cluttering heap_insert's API. (This would mean > > that *every* heap_insert is affected, not only COPY, which is what you > > want anyway I would think.) > > Well, I intended it to work *only* for copy, not for insert, which is > why I did it that way. To be more generally useful similar thing should be added to VACUUM FULL and CLUSTER. And perhaps some weird LAZY version of "VACUUM EXPAND" could be written as well, for keeping the holes from filling up in constantly growing databases. Having these holes is also essential, if we want a cheap way to keep data in CLUSTER order after initial CLUSTER command - if we do have free space everywhere in the file, we can just put each new tuple on the first page with free space on or after its preceeding tuple in cluster index. > Anyway, when I get time, I'll finish off the patch. Unless other readers > would like to follow on. I hope you will you will get that time before 8.1 ships :) -- Hannu Krosing <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] COPY Fillfactor patch
On Tue, 2005-04-12 at 09:56 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > During recent tuning of the TPC-C workload, I produced the following > > patch to force COPY to leave some space in each data block when it loads > > data into heap relations. > > Put the info into the Relation > structure instead of cluttering heap_insert's API. (This would mean > that *every* heap_insert is affected, not only COPY, which is what you > want anyway I would think.) Well, I intended it to work *only* for copy, not for insert, which is why I did it that way. I can see both sides of that discussion though. > In practice, a real fix for this would add a column to pg_class, in > which case you could get it from the Relation for free. Again, that was intentional when I wrote it. I am inclined to agree with that approach now, since it could be cached fairly easily. Anyway, when I get time, I'll finish off the patch. Unless other readers would like to follow on. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] COPY Fillfactor patch
On Wed, 2005-04-13 at 00:00 +1000, Neil Conway wrote: > Simon Riggs wrote: > > During recent tuning of the TPC-C workload, I produced the following > > patch to force COPY to leave some space in each data block when it loads > > data into heap relations. > > I can't get too excited about incorporating changes designed solely to > improve performance for the workload of a specific database benchmark. Sometimes we tune for specific workloads, sometimes we need to tune to a generic design pattern that effects many users. Unisys wanted to test and tune a workload that would improve things for the most number of users and I would say I support them in that. Database benchmarks exist for two reasons: - generate some great numbers - they offer a generic workload that stresses PostgreSQL in pseudo-real customer situations, but can be easily re-run, discussed, published and dissected for real insight The first reason helps people to accurately size systems and the second reason helps them save them time and moneybut without the first people buy the wrong systems and waste money anyway. IMHO if you care about the second, you should also care about the first. Seriously, if you know a workload that better represents the majority of performance critical database applications then I'll be happy to consider tuning for that at another time. (Seriously). > If the change has merit in some plausible "real world" situations, so be > it -- but if not, I don't see the point. Well, I've used PCTFREE many times with Oracle and DB2. Did it do any good? Well, thats much harder, because you don't get as much chance to tune specific issues like that in the real world, but yes, it makes a difference for *some* real workloads. All? No. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PATCHES] COPY Fillfactor patch
Simon Riggs wrote: During recent tuning of the TPC-C workload, I produced the following patch to force COPY to leave some space in each data block when it loads data into heap relations. I can't get too excited about incorporating changes designed solely to improve performance for the workload of a specific database benchmark. If the change has merit in some plausible "real world" situations, so be it -- but if not, I don't see the point. Each UPDATE tries to insert a new row version. The blocks are all full, so each UPDATE causes this activity: read block, decide cannot insert, unlock block extend relation by 1 block relock first block, lock new block insert row into new block After the patch, most UPDATEs cause only a single block access/update, excluding the index effects. Did you do any benchmarks to measure the performance of the patch? -Neil ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] COPY Fillfactor patch
Simon Riggs <[EMAIL PROTECTED]> writes: > During recent tuning of the TPC-C workload, I produced the following > patch to force COPY to leave some space in each data block when it loads > data into heap relations. This is an amazingly ugly way to do it. Put the info into the Relation structure instead of cluttering heap_insert's API. (This would mean that *every* heap_insert is affected, not only COPY, which is what you want anyway I would think.) In practice, a real fix for this would add a column to pg_class, in which case you could get it from the Relation for free. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend