This should be able to run _very_ fast.

At 01:42 PM 9/1/2005, Matthew Sackman wrote:

I'm having performance issues with a table consisting of 2,043,133 rows. The
schema is:

\d address
                  Table "public.address"
        Column        |          Type          | Modifiers
 postcode_top         | character varying(2)   | not null
 postcode_middle      | character varying(4)   | not null
 postcode_bottom      | character varying(7)   | not null
 postcode             | character varying(10)  | not null
 property_type        | character varying(15)  | not null
 sale_type            | character varying(10)  | not null
 flat_extra           | character varying(100) | not null
 number               | character varying(100) | not null
 street               | character varying(100) | not null
 locality_1           | character varying(100) | not null
 locality_2           | character varying(100) | not null
 city                 | character varying(100) | not null
 county               | character varying(100) | not null
    "address_city_index" btree (city)
    "address_county_index" btree (county)
    "address_locality_1_index" btree (locality_1)
    "address_locality_2_index" btree (locality_2)
    "address_pc_bottom_index" btree (postcode_bottom)
    "address_pc_middle_index" btree (postcode_middle)
    "address_pc_top_index" btree (postcode_top)
    "address_pc_top_middle_bottom_index" btree (postcode_top,
                             postcode_middle, postcode_bottom)
    "address_pc_top_middle_index" btree (postcode_top, postcode_middle)
    "address_postcode_index" btree (postcode)
    "address_property_type_index" btree (property_type)
    "address_street_index" btree (street)
    "street_prefix" btree (lower("substring"((street)::text, 1, 1)))

IOW, each row takes ~1KB on HD. First suggestion: format your HD to use 8KB pages with 1KB segments. That'll out each row down on HD as an atomic unit. 8KB pages also "play nice" with pg.

At 1KB per row, this table takes up ~2.1GB and should fit into RAM fairly easily on a decently configured DB server (my _laptop_ has 2GB of RAM after all...)

Since you are using ~2.1GB for 2 years worth of data, 15 years worth should take no more than 2.1GB*7.5= 15.75GB.

If you replace some of those 100 char fields with integers for code numbers and have an auxiliary table for each of those fields mapping the code numbers to the associated 100 char string, you should be able to shrink a row considerably. Your target is to have each row take <= 512B. Once a row fits into one 512B sector on HD, there's a no point in making it smaller unless you can shrink it enough to fit 2 rows into one sector (<= 256B). Once two rows fit into one sector, there's no point shrinking a row unless you can make 3 rows fit into a sector. Etc.

Assuming each 100 char (eg 100B) field can be replaced with a 4B int, each row could be as small as 76B. That makes 85B per row the goal as it would allow you to fit 6 rows per 512B HD sector. So in the best case your table will be 12x smaller in terms of real HD space.

Fitting one (or more) row(s) into one sector will cut down the real space used on HD for the table to ~7.88GB (or 1.32GB in the best case). Any such streamlining will make it faster to load, make the working set that needs to be RAM for best performance smaller, etc, etc.

This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 and a SATA harddrive.

Upgrade pg to 8.0.3 and make sure you have enough RAM for your real day to day load. Frankly, RAM is so cheap ($75-$150/GB), I'd just upgrade the machine to 4GB as a matter of course. P4's have PAE, so if your mainboard can hold it, put more than 4GB of RAM in if you find you need it.

Since you are describing your workload as being predominantly reads, you can get away with far less HD capability as long as you crank up RAM high enough to hold the working set of the DB. The indications from the OP are that you may very well be able to hold the entire DB in RAM. That's a big win whenever you can achieve it.

After these steps, there may still be performance issues that need attention, but the DBMS should be _much_ faster.

Ron Peacetree

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to