Re: [PATCHES] COPY Fillfactor patch

2005-04-20 Thread Simon Riggs
On Wed, 2005-04-20 at 09:42 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Wed, 2005-04-20 at 09:03 -0400, Bruce Momjian wrote:
> >> My guess is that the FILLFACTOR will actually be a GUC variable rather
> >> than a clause to CREATE INDEX or REINDEX or CLUSTER.
> 
> > I hope not, but it would be easier to set like that and a lot easier to
> > code up that way.
> 
> I don't like that either.  It seems highly unlikely that every table in
> a database would need the same fillfactor.  It should be a per-table
> attribute, maybe with a DB-wide default, much like we handle per-column
> statistics targets.
> 
> My thought about this is that the case where extra free space really
> helps is UPDATE: if heap_update can put the new tuple version on the
> same page where the old one is, there's a noticeable savings.  So what
> you'd probably want is that heap_insert respects fillfactor (ie, won't
> eat the last fillfactor percent of a page) but heap_update doesn't,
> if it can thereby put the update on the same page.  As long as you
> vacuum before a particular page is entirely filled by updates of its
> own tuples, you have a steady-state situation.

Agreed. Thats the best approach.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] COPY Fillfactor patch

2005-04-20 Thread Bruce Momjian
Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Wed, 2005-04-20 at 09:03 -0400, Bruce Momjian wrote:
> >> My guess is that the FILLFACTOR will actually be a GUC variable rather
> >> than a clause to CREATE INDEX or REINDEX or CLUSTER.
> 
> > I hope not, but it would be easier to set like that and a lot easier to
> > code up that way.
> 
> I don't like that either.  It seems highly unlikely that every table in
> a database would need the same fillfactor.  It should be a per-table
> attribute, maybe with a DB-wide default, much like we handle per-column
> statistics targets.

That is fine.  I just don't want to add the FILLFACTOR clause to several
commands.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PATCHES] COPY Fillfactor patch

2005-04-20 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Wed, 2005-04-20 at 09:03 -0400, Bruce Momjian wrote:
>> My guess is that the FILLFACTOR will actually be a GUC variable rather
>> than a clause to CREATE INDEX or REINDEX or CLUSTER.

> I hope not, but it would be easier to set like that and a lot easier to
> code up that way.

I don't like that either.  It seems highly unlikely that every table in
a database would need the same fillfactor.  It should be a per-table
attribute, maybe with a DB-wide default, much like we handle per-column
statistics targets.

My thought about this is that the case where extra free space really
helps is UPDATE: if heap_update can put the new tuple version on the
same page where the old one is, there's a noticeable savings.  So what
you'd probably want is that heap_insert respects fillfactor (ie, won't
eat the last fillfactor percent of a page) but heap_update doesn't,
if it can thereby put the update on the same page.  As long as you
vacuum before a particular page is entirely filled by updates of its
own tuples, you have a steady-state situation.

I don't see a lot of point in Hannu's suggestion about leaving every Nth
page free.  Once you've had to move off the source page, near or far
makes little difference.  It might win if we were also trying to teach
insert and update to preserve CLUSTER ordering --- but that is an order
of magnitude harder than what's been discussed in this thread, and I'm
unconvinced it's a good idea anyway ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] COPY Fillfactor patch

2005-04-20 Thread Simon Riggs
On Wed, 2005-04-20 at 09:03 -0400, Bruce Momjian wrote:
> My guess is that the FILLFACTOR will actually be a GUC variable rather
> than a clause to CREATE INDEX or REINDEX or CLUSTER.

I hope not, but it would be easier to set like that and a lot easier to
code up that way.

Trouble is, if you need it, it needs to be varied according to what is
happening to a particular relation. Otherwise you'd gain benefit in one
area and lose it in another.

Best Regards, Simon Riggs




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] COPY Fillfactor patch

2005-04-20 Thread Simon Riggs
On Wed, 2005-04-20 at 02:08 +0300, Hannu Krosing wrote:
> > Anyway, when I get time, I'll finish off the patch. Unless other readers
> > would like to follow on.
> 
> I hope you will you will get that time before 8.1 ships :)

It's very unlikely, sorry.

I'm working towards improving Business Intelligence workloads now and
this patch isn't in that direction. BI likes full blocks (and no
spaces).

The sparse heap idea would use resources better than a smooth spread and
is probably just as easy to code. The main problem is the FSM is only
currently filled at VACUUM time, so putting spaces in tables wouldn't
help much until after the first vacuum of a table.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] COPY Fillfactor patch

2005-04-20 Thread Bruce Momjian

My guess is that the FILLFACTOR will actually be a GUC variable rather
than a clause to CREATE INDEX or REINDEX or CLUSTER.

---

Hannu Krosing wrote:
> On T, 2005-04-12 at 18:38 +0100, Simon Riggs wrote:
> > On Tue, 2005-04-12 at 09:56 -0400, Tom Lane wrote:
> 
> (Neil, I added you to CC: to show you at least two more places where
> sparse heaps can be generally useful and not tweaks for single
> benchmark)
> 
> > > Simon Riggs <[EMAIL PROTECTED]> writes:
> > > > During recent tuning of the TPC-C workload, I produced the following
> > > > patch to force COPY to leave some space in each data block when it loads
> > > > data into heap relations.
> 
> When I comtemplated a similar idea some time ago, my primary goal was
> reducing disk head movement during massive updates. 
> 
> At that time it seemed to me cheaper to not leave space in each page,
> but to leave each Nth page empty, as more new tuples will be punt on the
> same page and thus cause less WAL writes. 
> 
> Warning: This may be a false assumption - I did not check from code, if
> this is actually so for any or even a significant number of cases.
> 
> > > Put the info into the Relation
> > > structure instead of cluttering heap_insert's API.  (This would mean
> > > that *every* heap_insert is affected, not only COPY, which is what you
> > > want anyway I would think.)
> > 
> > Well, I intended it to work *only* for copy, not for insert, which is
> > why I did it that way. 
> 
> To be more generally useful similar thing should be added to VACUUM FULL
> and CLUSTER.
> 
> And perhaps some weird LAZY version of "VACUUM EXPAND" could be written
> as well, for keeping the holes from filling up in constantly growing
> databases.
> 
> Having these holes is also essential, if we want a cheap way to keep
> data in CLUSTER order after initial CLUSTER command - if we do have free
> space everywhere in the file, we can just put each new tuple on the
> first page with free space on or after its preceeding tuple in cluster
> index.
> 
> > Anyway, when I get time, I'll finish off the patch. Unless other readers
> > would like to follow on.
> 
> I hope you will you will get that time before 8.1 ships :)
> 
> -- 
> Hannu Krosing <[EMAIL PROTECTED]>
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] COPY Fillfactor patch

2005-04-20 Thread Hannu Krosing
On T, 2005-04-12 at 18:38 +0100, Simon Riggs wrote:
> On Tue, 2005-04-12 at 09:56 -0400, Tom Lane wrote:

(Neil, I added you to CC: to show you at least two more places where
sparse heaps can be generally useful and not tweaks for single
benchmark)

> > Simon Riggs <[EMAIL PROTECTED]> writes:
> > > During recent tuning of the TPC-C workload, I produced the following
> > > patch to force COPY to leave some space in each data block when it loads
> > > data into heap relations.

When I comtemplated a similar idea some time ago, my primary goal was
reducing disk head movement during massive updates. 

At that time it seemed to me cheaper to not leave space in each page,
but to leave each Nth page empty, as more new tuples will be punt on the
same page and thus cause less WAL writes. 

Warning: This may be a false assumption - I did not check from code, if
this is actually so for any or even a significant number of cases.

> > Put the info into the Relation
> > structure instead of cluttering heap_insert's API.  (This would mean
> > that *every* heap_insert is affected, not only COPY, which is what you
> > want anyway I would think.)
> 
> Well, I intended it to work *only* for copy, not for insert, which is
> why I did it that way. 

To be more generally useful similar thing should be added to VACUUM FULL
and CLUSTER.

And perhaps some weird LAZY version of "VACUUM EXPAND" could be written
as well, for keeping the holes from filling up in constantly growing
databases.

Having these holes is also essential, if we want a cheap way to keep
data in CLUSTER order after initial CLUSTER command - if we do have free
space everywhere in the file, we can just put each new tuple on the
first page with free space on or after its preceeding tuple in cluster
index.

> Anyway, when I get time, I'll finish off the patch. Unless other readers
> would like to follow on.

I hope you will you will get that time before 8.1 ships :)

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] COPY Fillfactor patch

2005-04-12 Thread Simon Riggs
On Tue, 2005-04-12 at 09:56 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > During recent tuning of the TPC-C workload, I produced the following
> > patch to force COPY to leave some space in each data block when it loads
> > data into heap relations.
> 

> Put the info into the Relation
> structure instead of cluttering heap_insert's API.  (This would mean
> that *every* heap_insert is affected, not only COPY, which is what you
> want anyway I would think.)

Well, I intended it to work *only* for copy, not for insert, which is
why I did it that way. 

I can see both sides of that discussion though.

> In practice, a real fix for this would add a column to pg_class, in
> which case you could get it from the Relation for free.

Again, that was intentional when I wrote it. I am inclined to agree with
that approach now, since it could be cached fairly easily.

Anyway, when I get time, I'll finish off the patch. Unless other readers
would like to follow on.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] COPY Fillfactor patch

2005-04-12 Thread Simon Riggs
On Wed, 2005-04-13 at 00:00 +1000, Neil Conway wrote:
> Simon Riggs wrote:
> > During recent tuning of the TPC-C workload, I produced the following
> > patch to force COPY to leave some space in each data block when it loads
> > data into heap relations.
> 
> I can't get too excited about incorporating changes designed solely to 
> improve performance for the workload of a specific database benchmark. 

Sometimes we tune for specific workloads, sometimes we need to tune to a
generic design pattern that effects many users. Unisys wanted to test
and tune a workload that would improve things for the most number of
users and I would say I support them in that.

Database benchmarks exist for two reasons:
- generate some great numbers
- they offer a generic workload that stresses PostgreSQL in pseudo-real
customer situations, but can be easily re-run, discussed, published and
dissected for real insight 

The first reason helps people to accurately size systems and the second
reason helps them save them time and moneybut without the first
people buy the wrong systems and waste money anyway. IMHO if you care
about the second, you should also care about the first.

Seriously, if you know a workload that better represents the majority of
performance critical database applications then I'll be happy to
consider tuning for that at another time. (Seriously).

> If the change has merit in some plausible "real world" situations, so be 
> it -- but if not, I don't see the point.

Well, I've used PCTFREE many times with Oracle and DB2. Did it do any
good? Well, thats much harder, because you don't get as much chance to
tune specific issues like that in the real world, but yes, it makes a
difference for *some* real workloads. All? No.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PATCHES] COPY Fillfactor patch

2005-04-12 Thread Neil Conway
Simon Riggs wrote:
During recent tuning of the TPC-C workload, I produced the following
patch to force COPY to leave some space in each data block when it loads
data into heap relations.
I can't get too excited about incorporating changes designed solely to 
improve performance for the workload of a specific database benchmark. 
If the change has merit in some plausible "real world" situations, so be 
it -- but if not, I don't see the point.

Each UPDATE tries to insert a new row version. The blocks are all full,
so each UPDATE causes this activity:
read block, decide cannot insert, unlock block
extend relation by 1 block
relock first block, lock new block
insert row into new block
After the patch, most UPDATEs cause only a single block access/update,
excluding the index effects.
Did you do any benchmarks to measure the performance of the patch?
-Neil
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] COPY Fillfactor patch

2005-04-12 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> During recent tuning of the TPC-C workload, I produced the following
> patch to force COPY to leave some space in each data block when it loads
> data into heap relations.

This is an amazingly ugly way to do it.  Put the info into the Relation
structure instead of cluttering heap_insert's API.  (This would mean
that *every* heap_insert is affected, not only COPY, which is what you
want anyway I would think.)

In practice, a real fix for this would add a column to pg_class, in
which case you could get it from the Relation for free.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend