I'm having performance issues with a table consisting of 2,043,133 rows. The
schema is:

\d address
                  Table "public.address"
        Column        |          Type          | Modifiers 
 postcode_top         | character varying(2)   | not null
 postcode_middle      | character varying(4)   | not null
 postcode_bottom      | character varying(7)   | not null
 postcode             | character varying(10)  | not null
 property_type        | character varying(15)  | not null
 sale_type            | character varying(10)  | not null
 flat_extra           | character varying(100) | not null
 number               | character varying(100) | not null
 street               | character varying(100) | not null
 locality_1           | character varying(100) | not null
 locality_2           | character varying(100) | not null
 city                 | character varying(100) | not null
 county               | character varying(100) | not null
    "address_city_index" btree (city)
    "address_county_index" btree (county)
    "address_locality_1_index" btree (locality_1)
    "address_locality_2_index" btree (locality_2)
    "address_pc_bottom_index" btree (postcode_bottom)
    "address_pc_middle_index" btree (postcode_middle)
    "address_pc_top_index" btree (postcode_top)
    "address_pc_top_middle_bottom_index" btree (postcode_top,
                             postcode_middle, postcode_bottom)
    "address_pc_top_middle_index" btree (postcode_top, postcode_middle)
    "address_postcode_index" btree (postcode)
    "address_property_type_index" btree (property_type)
    "address_street_index" btree (street)
    "street_prefix" btree (lower("substring"((street)::text, 1, 1)))

This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 and a
SATA harddrive.

Queries such as:

select locality_2 from address where locality_2 = 'Manchester';

are taking 14 seconds to complete, and this is only 2 years worth of
data - we will have up to 15 years (so over 15 million rows).

Interestingly, doing:
explain select locality_2 from address where locality_2 = 'Manchester';
                           QUERY PLAN                           
 Seq Scan on address  (cost=0.00..80677.16 rows=27923 width=12)
   Filter: ((locality_2)::text = 'Manchester'::text)

explain select locality_1 from address where locality_1 = 'Manchester';
                           QUERY PLAN                                           
 Index Scan using address_locality_1_index on address
(cost=0.00..69882.18 rows=17708 width=13)
   Index Cond: ((locality_1)::text = 'Manchester'::text)

Sadly, using the index makes things worse, the query taking 17 seconds.

locality_1 has 16650 distinct values and locality_2 has 1156 distinct

Whilst the locality_2 query is in progress, both the disk and the CPU
are maxed out with the disk constantly reading at 60MB/s and the CPU
rarely dropping under 100% load.

With the locality_1 query in progress, the CPU is maxed out but the disk
is reading at just 3MB/s.

Obviously, to me, this is a problem, I need these queries to be under a
second to complete. Is this unreasonable? What can I do to make this "go
faster"? I've considered normalising the table but I can't work out
whether the slowness is in dereferencing the pointers from the index
into the table or in scanning the index in the first place. And
normalising the table is going to cause much pain when inserting values
and I'm not entirely sure if I see why normalising it should cause a
massive performance improvement.

I need to get to the stage where I can run queries such as:
select street, locality_1, locality_2, city from address 
where (city = 'Nottingham' or locality_2 = 'Nottingham'
       or locality_1 = 'Nottingham')
  and upper(substring(street from 1 for 1)) = 'A' 
group by street, locality_1, locality_2, city
order by street
limit 20 offset 0

and have the results very quickly.

Any help most gratefully received (even if it's to say that I should be
posting to a different mailing list!).

Many thanks,


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to