Re: Re: [HACKERS] Why frequently updated tables are an issue

2004-10-21 Thread simon

I'm thinking that this subject has some cross-overs with another topic
mentioned recently: improving VACUUM performance.

Just to recap: Frequently updated tables are a problem because they grow
significantly, with many useless row versions in there. The usual
solution is to run frequent VACUUMs on that table, though even that
isn't enough for some update/change rates because we need to scan the
whole table rather than just the hot blocks.

The ideas thrown around earlier this month were to create a "Cleaner"
process that performed individual block-level VACUUMs on dirty blocks
in shared_buffers, prior to their being bgwriten/checkpointed. (It
wasn't called that at the time, IIRC, but was described more in terms
of the CTID list that would be required to manage this).

Such a Cleaner process could solve the problem of frequently updated
tables by maintaining the number of row versions at a reasonably low
level. There wasn't much excitement when the "speed-up vacuum" idea was
discussed, including from myself. Thinking about the frequent-update
problem makes me want to revisit this idea now, since it has a much
clearer role in solving the frequent update problem. As I look at this
now it sounds very much like the freelist problem looked a while back,
with a similar-ish solution...

Frequent updates will happen on blocks within the T2 cache, since they
will be heavily accessed. The more frequently accessed, the more they
need cleaning. This is actually the opposite end of the cache from the
bgwriter, who is interested in writing blocks that aren't frequently
used - to avoid having to continually write the same blocks. As a
result, I'm thinking maybe we wouldn't want to have a queue to manage
the list of blocks that need both Cleaning and writing. Discuss...

Another way to do this is to find a way of identifying the specific
blocks that require Cleaning. We could use an event to fire-off the
cleaning action, such as when an update causes a block split. In that
case, the older block could be added to the Cleaner's cleanlist. 
The Cleaner then works through the cleanlist, doing a block-level
vacuum.

That way the Cleaner would stay a reasonable number of row versions back
from the point of update to make a VACUUM potentially useful, as well
as being fired infrequently enough not to get in the way [since as
Alvaro has been pointed out, the indexes need to be visited in addition
to the main data block - which could introduce contention for index
locks because the frequent updater is probably using an index to reach
the data] Also, picking the block-split event would also mean stopping
the frequent-updater just at the point where significant extra costs
get incurred from all the additional row versions - when a row has
versions across many blocks.

I'm sure there are some faulted thoughts in all of that, but overall, I
think that a Cleaner process to perform block-level VACUUMs becomes
more viable when it has an important problem to solve.

Best Regards, Simon Riggs

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Why frequently updated tables are an issue

2004-10-21 Thread Josh Berkus
Chris, Manfred, David, "Mohawk",

> > What about a dblink style interface to a non-MVCC SQL database? ÂI
> > think someone on this list mentioned that there are open source
> > in-memory SQL databases.
>
> Funny you should mention this. ÂI'm working on a generalized one that
> uses PL/PerlU at

The problem with this approach (in PLPerl or otherwise) is that the connection 
overhead to the external data source will cost you far more than MVCC does.   
The only way it could be a benefit is if you could somehow hold the 
connecition "open" from query to query, which SPs and Functions are 
ill-suited for, being more-or-less "stateless".

I've personally been looking at Memcached for this issue.  However, the 
overhead of installing memcached is substantial, and I've not been able to 
test if the in-development PLPerlU allows me to cache the Memcached request 
object somehow.  Andrew suggested a way but I don't have the software 
installed.

Handling this at the middleware layer looks increasingly attractive.

> Thanks to Josh Berkus for design ideas & implementation roadmap, and
> to Josh Drake, Andrew Dunstan, Abhijit Menon-Sen and (of course) Tom
> Lane for the infrastructure that makes this doable. :)

You're welcome ... though I'd had in mind access to legacy data rather than 
caching of disposable results.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Why frequently updated tables are an issue

2004-10-20 Thread David Fetter
On Wed, Oct 20, 2004 at 07:10:35PM +0200, Manfred Spraul wrote:
> [EMAIL PROTECTED] wrote a few months ago:
> 
> >PostgreSQL's behavior on these cases is poor.  I don't think anyone
> >who has tried to use PG for this sort of thing will disagree, and
> >yes it is getting better.  Does anyone else consider this to be a
> >problem?  If so, I'm open for suggestions on what can be done.
> >I've suggested a number of things, and admittedly they have all
> >been pretty weak ideas, but they were potentially workable.
> >
> What about a dblink style interface to a non-MVCC SQL database?  I
> think someone on this list mentioned that there are open source
> in-memory SQL databases.

Funny you should mention this.  I'm working on a generalized one that
uses PL/PerlU at

http://fetter.org/pg_ps/

Thanks to Josh Berkus for design ideas & implementation roadmap, and
to Josh Drake, Andrew Dunstan, Abhijit Menon-Sen and (of course) Tom
Lane for the infrastructure that makes this doable. :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Why frequently updated tables are an issue

2004-10-20 Thread Manfred Spraul
[EMAIL PROTECTED] wrote a few months ago:
PostgreSQL's behavior on these cases is poor. I don't think anyone who has
tried to use PG for this sort of thing will disagree, and yes it is
getting better. Does anyone else consider this to be a problem? If so, I'm
open for suggestions on what can be done. I've suggested a number of
things, and admittedly they have all been pretty weak ideas, but they were
potentially workable.
 

What about a dblink style interface to a non-MVCC SQL database?
I think someone on this list mentioned that there are open source 
in-memory SQL databases.

--
   Manfred
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-18 Thread Bruce Momjian

Is there a TODO here?  No one explained why frozen tuples are important.

---

Tom Lane wrote:
> Jan Wieck <[EMAIL PROTECTED]> writes:
> > On 6/12/2004 3:45 PM, Tom Lane wrote:
> >> I don't think it would help very much to define a bit like that --- I
> >> can't believe that very many pages would contain only frozen tuples,
> >> unless you were to adopt an aggressive policy of using VACUUM FREEZE
> >> a lot.
> 
> > I thought this implies an aggressive policy of freezing everything by 
> > default. But I guess there is something I am not aware of that makes 
> > aggressive freezing a bad thing.
> 
> Well, it means extra I/O to freeze tuples that you otherwise probably
> never would.  So it's not obvious that aggressive freezing in hopes of
> saving cycles later is a win.
> 
> >> It might be interesting though to have some kind of "fast vacuum" mode
> >> that doesn't worry about freezing tuples, but only reclaiming dead ones.
> 
> > Wouldn't that screw the current FSM population mechanisms? Not that my 
> > suggestions above wouldn't do that either :-)
> 
> Yeah, that's another "wholesale" mechanism that we'd have to look at
> refining.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-12 Thread Bruce Momjian
Jan Wieck wrote:
> On 6/12/2004 3:45 PM, Tom Lane wrote:
> 
> > Jan Wieck <[EMAIL PROTECTED]> writes:
> >> But a per relation bitmap that tells if a block is a) free of dead 
> >> tuples and b) all remaining tuples in it are frozen could be used to let 
> >> vacuum skip them (there can't be anything to do). The bit would get 
> >> reset whenever the block is marked dirty. This would cause vacuum to 
> >> look at mainly recently touched blocks, likely to be found in the buffer 
> >> cache anyway and thus dramatically reduce the amount of IO and thereby 
> >> make high frequent vacuuming less expensive.
> > 
> > I don't think it would help very much to define a bit like that --- I
> > can't believe that very many pages would contain only frozen tuples,
> > unless you were to adopt an aggressive policy of using VACUUM FREEZE
> > a lot.
> 
> I thought this implies an aggressive policy of freezing everything by 
> default. But I guess there is something I am not aware of that makes 
> aggressive freezing a bad thing.

Why are frozen tuples significant?  I assume any page that has no dead
tuples could be skipped by vacuum.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-12 Thread Tom Lane
Jan Wieck <[EMAIL PROTECTED]> writes:
> On 6/12/2004 3:45 PM, Tom Lane wrote:
>> I don't think it would help very much to define a bit like that --- I
>> can't believe that very many pages would contain only frozen tuples,
>> unless you were to adopt an aggressive policy of using VACUUM FREEZE
>> a lot.

> I thought this implies an aggressive policy of freezing everything by 
> default. But I guess there is something I am not aware of that makes 
> aggressive freezing a bad thing.

Well, it means extra I/O to freeze tuples that you otherwise probably
never would.  So it's not obvious that aggressive freezing in hopes of
saving cycles later is a win.

>> It might be interesting though to have some kind of "fast vacuum" mode
>> that doesn't worry about freezing tuples, but only reclaiming dead ones.

> Wouldn't that screw the current FSM population mechanisms? Not that my 
> suggestions above wouldn't do that either :-)

Yeah, that's another "wholesale" mechanism that we'd have to look at
refining.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-12 Thread Jan Wieck
On 6/12/2004 3:45 PM, Tom Lane wrote:
Jan Wieck <[EMAIL PROTECTED]> writes:
But a per relation bitmap that tells if a block is a) free of dead 
tuples and b) all remaining tuples in it are frozen could be used to let 
vacuum skip them (there can't be anything to do). The bit would get 
reset whenever the block is marked dirty. This would cause vacuum to 
look at mainly recently touched blocks, likely to be found in the buffer 
cache anyway and thus dramatically reduce the amount of IO and thereby 
make high frequent vacuuming less expensive.
I don't think it would help very much to define a bit like that --- I
can't believe that very many pages would contain only frozen tuples,
unless you were to adopt an aggressive policy of using VACUUM FREEZE
a lot.
I thought this implies an aggressive policy of freezing everything by 
default. But I guess there is something I am not aware of that makes 
aggressive freezing a bad thing.

It might be interesting though to have some kind of "fast vacuum" mode
that doesn't worry about freezing tuples, but only reclaiming dead ones.
This could look at only recently touched pages, with perhaps the
cooperation of the bgwriter to keep track of candidate pages.  (You'd
still have to do full-database scans for freezable tuples occasionally ...
but not very often.)
Wouldn't that screw the current FSM population mechanisms? Not that my 
suggestions above wouldn't do that either :-)

The main thing I don't see how to handle efficiently is getting rid of
the index entries for dead rows.  The present indexam API is designed
for getting rid of index entries wholesale, but you'd need something
that works better for retail removal to approach vacuuming this way.
Which is a problem so fundamentally embedded in the index implementation 
itself that is sounds more like a wholesale replacement of the index am 
than some nifty changes.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-12 Thread Tom Lane
Jan Wieck <[EMAIL PROTECTED]> writes:
> But a per relation bitmap that tells if a block is a) free of dead 
> tuples and b) all remaining tuples in it are frozen could be used to let 
> vacuum skip them (there can't be anything to do). The bit would get 
> reset whenever the block is marked dirty. This would cause vacuum to 
> look at mainly recently touched blocks, likely to be found in the buffer 
> cache anyway and thus dramatically reduce the amount of IO and thereby 
> make high frequent vacuuming less expensive.

I don't think it would help very much to define a bit like that --- I
can't believe that very many pages would contain only frozen tuples,
unless you were to adopt an aggressive policy of using VACUUM FREEZE
a lot.

It might be interesting though to have some kind of "fast vacuum" mode
that doesn't worry about freezing tuples, but only reclaiming dead ones.
This could look at only recently touched pages, with perhaps the
cooperation of the bgwriter to keep track of candidate pages.  (You'd
still have to do full-database scans for freezable tuples occasionally ...
but not very often.)

The main thing I don't see how to handle efficiently is getting rid of
the index entries for dead rows.  The present indexam API is designed
for getting rid of index entries wholesale, but you'd need something
that works better for retail removal to approach vacuuming this way.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-12 Thread Jan Wieck
On 6/10/2004 10:37 AM, Shridhar Daithankar wrote:
[EMAIL PROTECTED] wrote:
The session table is a different issue, but has the same problems. You
have an active website, hundreds or thousands of hits a second, and you
want to manage sessions for this site. Sessions are created, updated many
times, and deleted. Performance degrades steadily until a vacuum. Vacuum
has to be run VERY frequently. Prior to lazy vacuum, this was impossible.
Both session tables and summary tables have another thing in common, they
are not vital data, they hold transitive state information. Yea, sure,
data integrity is important, but if you lose these values, you can either
recreate it or it isn't too important.
Why put that is a database at all? Because, in the case of sessions
especially, you need to access this information for other operations. In
the case of summary tables, OLAP usually needs to join or include this
info.
PostgreSQL's behavior on these cases is poor. I don't think anyone who has
tried to use PG for this sort of thing will disagree, and yes it is
getting better. Does anyone else consider this to be a problem? If so, I'm
open for suggestions on what can be done. I've suggested a number of
things, and admittedly they have all been pretty weak ideas, but they were
potentially workable.
There is another as-of-non-feasible and hence rejected approach. Vacuum in 
postgresql is tied to entire relations/objects since indexes do not have 
transaction visibility information.

It has been suggested in past to add such a visibility to index tuple header so 
that index and heaps can be cleaned out of order. In such a case other backround 
processes such as background writer and soon-to-be integrated autovacuum daemon 
can vacuum pages/buffers rather than relations. That way most used things will 
remain clean and cost of cleanup will remain outside crtical transaction 
processing path.
This is not feasable because at the time you update or delete a row you 
would have to visit all it's index entries. The performance impact on 
that would be immense.

But a per relation bitmap that tells if a block is a) free of dead 
tuples and b) all remaining tuples in it are frozen could be used to let 
vacuum skip them (there can't be anything to do). The bit would get 
reset whenever the block is marked dirty. This would cause vacuum to 
look at mainly recently touched blocks, likely to be found in the buffer 
cache anyway and thus dramatically reduce the amount of IO and thereby 
make high frequent vacuuming less expensive.

Jan
--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-10 Thread Shridhar Daithankar
Glen Parker wrote:
It has been suggested in past to add such a visibility to index
tuple header so
that index and heaps can be cleaned out of order. In such a case
other backround
It seems to me that the benefit of this wouldn't be all that impressive
*when accessing the cache*, which is the problem this discussion is about.
I doubt. I have seen examnples on general list where people have thousands of 
dead *pages* for few hundred live tuples. If it is a problem with cache access, 
it will spill to disk as the problem grows.

I don't think postgresql cache is that bad. No matter how you do it RAM is RAM. 
Problem is with disk bandwidth. See past discussions about vacuum delay patch 
and improvement it brought around.

Vacuum costs disk bandwidth and that affects performance. That remains a fact.
Disk access would occur more commonly with large tables, which I'll ignore.
Let's say total scan time for a query on a very dirty table is 100ms.  It
seems safe to assume that the scan time for the index would be *roughly*
half that of the heap.  If visibilty could be determined by looking at just
the index tuple, you'd cut you query scan time down to 50ms.  When the clean
table case is 7ms total scan time, the difference between 50 and 100 ms is
not much of an issue; either way, it's still way to high!
However increasing index footprint seems to be a tough sell.
And rightly so, IMO.
Mee too. Unless somebody comes up with patch that demonstrates the improvement. 
Obviously people can live with cost of mandatory vacuum so this is not high 
priority. But one day it will be.

OTOH if the perceived benefit is not there, at least it is proven that it is not 
there. I plan to do it when I find time. But again, I don't face the problem 
myself(I don't even use postgresql for anything important for that matter) so 
haven't bothered spending any time on it.

As long as it is not high priority, it is going to be a tough sell. Thats not 
unusual.

 Shridhar
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-10 Thread Glen Parker
> It has been suggested in past to add such a visibility to index
> tuple header so
> that index and heaps can be cleaned out of order. In such a case
> other backround

It seems to me that the benefit of this wouldn't be all that impressive
*when accessing the cache*, which is the problem this discussion is about.
Disk access would occur more commonly with large tables, which I'll ignore.
Let's say total scan time for a query on a very dirty table is 100ms.  It
seems safe to assume that the scan time for the index would be *roughly*
half that of the heap.  If visibilty could be determined by looking at just
the index tuple, you'd cut you query scan time down to 50ms.  When the clean
table case is 7ms total scan time, the difference between 50 and 100 ms is
not much of an issue; either way, it's still way to high!

> However increasing index footprint seems to be a tough sell.


And rightly so, IMO.

Glen Parker


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-10 Thread Shridhar Daithankar
[EMAIL PROTECTED] wrote:
The session table is a different issue, but has the same problems. You
have an active website, hundreds or thousands of hits a second, and you
want to manage sessions for this site. Sessions are created, updated many
times, and deleted. Performance degrades steadily until a vacuum. Vacuum
has to be run VERY frequently. Prior to lazy vacuum, this was impossible.
Both session tables and summary tables have another thing in common, they
are not vital data, they hold transitive state information. Yea, sure,
data integrity is important, but if you lose these values, you can either
recreate it or it isn't too important.
Why put that is a database at all? Because, in the case of sessions
especially, you need to access this information for other operations. In
the case of summary tables, OLAP usually needs to join or include this
info.
PostgreSQL's behavior on these cases is poor. I don't think anyone who has
tried to use PG for this sort of thing will disagree, and yes it is
getting better. Does anyone else consider this to be a problem? If so, I'm
open for suggestions on what can be done. I've suggested a number of
things, and admittedly they have all been pretty weak ideas, but they were
potentially workable.
There is another as-of-non-feasible and hence rejected approach. Vacuum in 
postgresql is tied to entire relations/objects since indexes do not have 
transaction visibility information.

It has been suggested in past to add such a visibility to index tuple header so 
that index and heaps can be cleaned out of order. In such a case other backround 
processes such as background writer and soon-to-be integrated autovacuum daemon 
can vacuum pages/buffers rather than relations. That way most used things will 
remain clean and cost of cleanup will remain outside crtical transaction 
processing path.

However increasing index footprint seems to be a tough sell. Besides FSM would 
need some rework to accomodate/autotune it's behaviour.

I am quoting from memory, so don't flame me if I misquote it. Just adding to 
make this complete. Only from performance point of view, it could solve quite 
some problems, at least in theory.

 Shridhar
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Why frequently updated tables are an issue

2004-06-10 Thread James Robinson
On Jun 10, 2004, at 10:30 AM, [EMAIL PROTECTED] wrote:
Prior to lazy vacuum, this was impossible.
Do you know for sure that lazy vacuum and/or autovacuum does
not indeed solve / alleviate the symptoms of the general problem
of very high rate table updates?
Back to lurking!

James Robinson
Socialserve.com
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Why frequently updated tables are an issue

2004-06-10 Thread pgsql
OK, the problem I am having with whole discussion, on several fronts, is
the idea of performance. If performance and consistent behavior were not
*important* issues to a project, a summary table would work fine, and I
could just vacuum frequently.

Currently a client needs to vacuum two summary tables at least once every
two seconds. The performace of the system slowly declines with each
summary update, until the next vacuum. After a vacuum, the transaction
comes in at about 7ms, it increases to about 35ms~50ms, then we vacuum and
we've back to 7ms. When we vacuumed every 30 seconds, it would sometimes
get up to whole seconds.

There is an important issue here. Yes, MVCC is good. I agree, and no one
is arguing against it in a general case, however, there are classes of
problems in which MVCC, or at least PostgreSQL's implementation of it, is
not the best solution.

There are two basic problems which are fundimental issues I've had with
PostgreSQL over the years: summary tables and session tables.

The summary tables take the place of a "select sum(col) from table" where
table is very small. The amount of vacuuming required and the steady
degradation of performance prior to each vacuum is a problem that could be
addressed by some global variable system.

The session table is a different issue, but has the same problems. You
have an active website, hundreds or thousands of hits a second, and you
want to manage sessions for this site. Sessions are created, updated many
times, and deleted. Performance degrades steadily until a vacuum. Vacuum
has to be run VERY frequently. Prior to lazy vacuum, this was impossible.

Both session tables and summary tables have another thing in common, they
are not vital data, they hold transitive state information. Yea, sure,
data integrity is important, but if you lose these values, you can either
recreate it or it isn't too important.

Why put that is a database at all? Because, in the case of sessions
especially, you need to access this information for other operations. In
the case of summary tables, OLAP usually needs to join or include this
info.

PostgreSQL's behavior on these cases is poor. I don't think anyone who has
tried to use PG for this sort of thing will disagree, and yes it is
getting better. Does anyone else consider this to be a problem? If so, I'm
open for suggestions on what can be done. I've suggested a number of
things, and admittedly they have all been pretty weak ideas, but they were
potentially workable.



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings