Re: [HACKERS] vacuum, performance, and MVCC

Mark Woodward Fri, 23 Jun 2006 05:44:06 -0700

>> Let me ask a question, you have this hundred million row table. OK, how
>> much of that table is "read/write?" Would it be posible to divide the
>> table into two (or more) tables where one is basically static, only
>> infrequent inserts and deletes, and the other is highly updated?
>
> Well, all of it is read write... some of the data might be updated less
> frequently, but there's no way I would know which part of the data is
> that. Logically is just the same type of data... so unless I find a way
> to continuously move back and forth the data between an archive table
> and the live table, based on how active the groups are, I can't imagine
> any other way of partitioning it. And that would also mean some quite
> big load given the pretty high dynamics of the groups.
>
>> The "big" thing in performance is the amount of disk I/O, if you have a
>> smaller active table with only a single index, then you may be able to
>> cut
>> your disk I/O time really down. The smaller the row size, the more rows
>> fit into a block. The fewer blocks the less dissk I/O. The less disk I/O
>> the bbetter the performance.
>
> I agree, but it is quite hard to achieve that when the data set is both
> big AND the partitioning criteria is highly dynamic. Not to mention that
> deleting from that table is also a PITA performance-wise, so I wonder
> how well the continuous back and forth between the active and inactive
> table would do.
>
>> Also, and anyone listening correct me if I'm wrong, you NEED to vacuum
>> frequently because the indexes grow and vacuuming them doesnt remove
>> everything, sometimes a REINDEX or a drop/recreate is the only way to
>> get
>> performance back. So if you wait too long between vacuums, your indexes
>> grow  and spread across more disk blocks than they should and thus use
>> more disk I/O to search and/or shared memory to cache.
>
> This is nice in theory, but kills performance. I vacuum the big tables
> only overnight, otherwise the server is sluggish.


Well, the only thing left is to cluster the database. There are a couple
ways to do this, one switch to a platform that supports clustering or
create an API to wrap multiple databases. If your queries are simple and
limited, you could create an HTTP/XML service that wraps a number of
postgresql databases, issues a query across all databases, merges multiple
query sets, and returns one homoginous stream.

Inserts would be handled by hash to machine weighted by number of records
on each machine.

Updates and deletes would have two keys, machine and ID.

It sounds like you have a "big" problem and you need a "big" solution.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC

Reply via email to