Re: [PERFORM] Massive performance issues

Ron Thu, 01 Sep 2005 14:03:48 -0700

This should be able to run _very_ fast.

At 01:42 PM 9/1/2005, Matthew Sackman wrote:

Hi,


I'm having performance issues with a table consisting of 2,043,133 rows. The
schema is:

\d address
                  Table "public.address"
        Column        |          Type          | Modifiers
----------------------+------------------------+-----------
 postcode_top         | character varying(2)   | not null
 postcode_middle      | character varying(4)   | not null
 postcode_bottom      | character varying(7)   | not null
 postcode             | character varying(10)  | not null
 property_type        | character varying(15)  | not null
 sale_type            | character varying(10)  | not null
 flat_extra           | character varying(100) | not null
 number               | character varying(100) | not null
 street               | character varying(100) | not null
 locality_1           | character varying(100) | not null
 locality_2           | character varying(100) | not null
 city                 | character varying(100) | not null
 county               | character varying(100) | not null
Indexes:
    "address_city_index" btree (city)
    "address_county_index" btree (county)
    "address_locality_1_index" btree (locality_1)
    "address_locality_2_index" btree (locality_2)
    "address_pc_bottom_index" btree (postcode_bottom)
    "address_pc_middle_index" btree (postcode_middle)
    "address_pc_top_index" btree (postcode_top)
    "address_pc_top_middle_bottom_index" btree (postcode_top,
                             postcode_middle, postcode_bottom)
    "address_pc_top_middle_index" btree (postcode_top, postcode_middle)
    "address_postcode_index" btree (postcode)
    "address_property_type_index" btree (property_type)
    "address_street_index" btree (street)
    "street_prefix" btree (lower("substring"((street)::text, 1, 1)))

IOW, each row takes ~1KB on HD. First suggestion: format your HD touse 8KB pages with 1KB segments. That'll out each row down on HD asan atomic unit. 8KB pages also "play nice" with pg.

At 1KB per row, this table takes up ~2.1GB and should fit into RAMfairly easily on a decently configured DB server (my _laptop_ has 2GBof RAM after all...)

Since you are using ~2.1GB for 2 years worth of data, 15 years worthshould take no more than 2.1GB*7.5= 15.75GB.

If you replace some of those 100 char fields with integers for codenumbers and have an auxiliary table for each of those fields mappingthe code numbers to the associated 100 char string, you should beable to shrink a row considerably. Your target is to have each rowtake <= 512B. Once a row fits into one 512B sector on HD, there's ano point in making it smaller unless you can shrink it enough to fit2 rows into one sector (<= 256B). Once two rows fit into one sector,there's no point shrinking a row unless you can make 3 rows fit intoa sector. Etc.

Assuming each 100 char (eg 100B) field can be replaced with a 4B int,each row could be as small as 76B. That makes 85B per row the goalas it would allow you to fit 6 rows per 512B HD sector. So in thebest case your table will be 12x smaller in terms of real HD space.

Fitting one (or more) row(s) into one sector will cut down the realspace used on HD for the table to ~7.88GB (or 1.32GB in the bestcase). Any such streamlining will make it faster to load, make theworking set that needs to be RAM for best performance smaller, etc, etc.

This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4and a SATA harddrive.

Upgrade pg to 8.0.3 and make sure you have enough RAM for your realday to day load. Frankly, RAM is so cheap ($75-$150/GB), I'd justupgrade the machine to 4GB as a matter of course. P4's have PAE, soif your mainboard can hold it, put more than 4GB of RAM in if youfind you need it.

Since you are describing your workload as being predominantly reads,you can get away with far less HD capability as long as you crank upRAM high enough to hold the working set of the DB. The indicationsfrom the OP are that you may very well be able to hold the entire DBin RAM. That's a big win whenever you can achieve it.

After these steps, there may still be performance issues that needattention, but the DBMS should be _much_ faster.


Ron Peacetree



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [PERFORM] Massive performance issues

Reply via email to