Re: [PERFORM] Partition table in 9.0.x?

AJ Weber Sun, 06 Jan 2013 07:28:11 -0800

All fair questions...

Thank you for your detailed response!



On 1/4/2013 11:03 PM, Jeff Janes wrote:

On Friday, January 4, 2013, AJ Weber wrote:

    Hi all,
I have a table that has about 73mm rows in it and growing.
How big is the table in MB?  Its indexes?

Not sure on this.  Will see if pgAdmin tells me.

    ...

    The server has 12GB RAM, 4 cores, but is shared with a big webapp
    running in Tomcat -- and I only have a RAID1 disk to work on.
     Woes me...
By a RAID1 disk, do you mean two disks in a RAID1 configuration, or asingle RAID1 composed of an unspecified number of disks?
Often spending many thousands of dollars in DBA time can save you fromhaving to buy many hundreds of dollars in hard drives. :) On theother hand, often you end up having to buy the extra disks anywayafterall.

I mean I have two disks in a RAID1 configuration. The server iscurrently in a whitebox datacenter and I have zero control over thehardware, so adding disks is unfortunately out of the question. Icompletely understand the comment, and would love to have a larger SANavailable to me that I could configure...I just don't and have no way ofgetting one anytime soon.

    Anyway, this table is going to continue to grow, and it's used
frequently (Read and Write).
Are all rows in the table read and written with equal vigor, or arethere hot rows and cold rows that can be recognized based on the row'svalues?

No, I could probably figure out a way to setup an "archive" or "older"section of the data that is updated much less frequently. Deletes arerare. Inserts/Updates "yes". Select on existing rows -- very frequent.

     From what I read, this table is a candidate to be partitioned for
    performance and scalability.  I have tested some scripts to build
    the "inherits" tables with their constraints and the
    trigger/function to perform the work.
Am I doing the right thing by partitioning this?
Probably not. Or at least, you haven't given us the information toknow. Very broadly speaking, well-implemented partitioning makes bulkloading and removal operations take less IO, but makes normaloperations take more IO, or if lucky leaves it unchanged. There areexceptions, but unless you can identify a very specific reason tothink you might have one of those exceptions, then you probably don't.

I know you can't believe everything you read, but I thought I saw somemetrics about when a table's size exceeds some fraction of availableRAM, or when it approaches 100mm rows, it's a big candidate forpartitioning.

Do you have a natural partitioning key? That is, is there a column(or expression) which occurs as a selective component in the whereclause of almost all of your most io consuming SQL and DML? If so,you might benefit from partitioning on it. (But in that case, youmight be able to get most of the benefits of partitioning, without theheadaches of it, just by revamping your indexes to include thatcolumn/expression as their leading field).
If you don't have a good candidate partitioning key, then partitioningwill almost surely make things worse.

The table is a "detail table" to its master records. That is, it's likean order-details table where it will have a 1-n rows joined to themaster ("order") table on the order-id. So I can partition it based onthe order number pretty easily (which is a bigint, btw).

     If so, and I can afford some downtime, is dumping the table via
    pg_dump and then loading it back in the best way to do this?
To do efficient bulk loading into a partitioned table, you need tospecifically target each partition, rather than targeting with atrigger. That pretty much rules out pg_dump, AFAIK, unless you aregoing to parse the dump file(s) and rewrite them.
    Should I run a cluster or vacuum full after all is done?
Probably not. If a cluster after the partitioning would bebeneficial, there would be a pretty good chance you could do a cluster*instead* of the partitioning and get the same benefit.

I did try clustering the table on the PK (which is actually 4 columns),and it appeared to help a bit. I was hoping partitioning was going tohelp me even more.

If you do some massive deletes from the parent table as part ofpopulating the children, then a vacuum full of the parent could beuseful. But if you dump the parent table, truncate it, and reload itas partitioned tables, then vacuum full would probably not be useful.
Really, you need to identify your most resource-intensive queriesbefore you can make any reasonable decisions.
    Is there a major benefit if I can upgrade to 9.2.x in some way
    that I haven't realized?
If you have specific queries that are misoptimized and so aregenerating more IO than they need to, then upgrading could help. Onthe other hand, it could also make things worse, if a currently welloptimized query becomes worse.

Is there some new feature or optimization you're thinking about withthis comment? If so, could you please just send me a link and/orfeature name and I'll google it myself?

But, instrumentation has improved in 9.2 from 9.0, so upgrading wouldmake it easier to figure out just which queries are really bad andhave the most opportunity for improvement. A little well informedoptimization might obviate the need for either partitioning or morehard drives.

This is interesting too. I obviously would like the best availableoptions to tune the database and the application. Is this detailed inthe release notes somewhere, and what tools could I use to takeadvantage of this? (Are there new/improved details included in theEXPLAIN statement or something?)

    Finally, if anyone has any comments about my settings listed above
    that might help improve performance, I thank you in advance.
Your default statistics target seemed low. Without knowing the natureof your most resource intensive queries or how much memory tomcat isusing, it is hard to say more.

Tomcat uses 4G of RAM, plus we have nginx in front using a little andsome other, smaller services running on the server in addition to theusual Linux gamut of processes.


Cheers,

Jeff

Re: [PERFORM] Partition table in 9.0.x?

Reply via email to