On 2/6/04 12:30 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote: > Steve Lane <[EMAIL PROTECTED]> writes: >> When I drop and rebuild the indexes, they take oddly varying amounts of time >> to rebuild. I rebuilt them in the following order, with the following rough >> times. I took a guess that the speed of the rebuild might be related to the >> number of distinct values in the column -- this seems true in some cases but >> not in others. Here are the times: > >> id_response 38 secs (86000 distinct) >> id_topic 33 secs (6 distinct) >> id_survey 13 secs (1 distinct) >> id_code 39 secs (1444 distinct) >> id_item 40 secs (65 distinct) >> id_administration 13 secs (1 distinct) > > How many rows altogether in this table? What sort_mem are you using > (and have you tried altering it)?
Sorry, left out important info. I cleaned the table first (it's a temporary copy), then imported one run of data, 86,000 rows. I haven't checked sort_mem, will do so. > > When I read your previous message I was wondering about different > datatypes having different comparison costs, but since these are all > integer columns that's clearly not the issue here. I think you may > be seeing something related to the number of initial runs created in the > sorting algorithm --- with only one distinct value, there'd always be > just one run and no need for any merge passes. > > (Why are you bothering to index columns with only one or a few values in > them, anyway? Such an index will be useless for searches ...) Again, I left out some useful details. Id_survey and id_administration will be identical for a single run of data (which can generate 10-100K inserts). And my tests were just one data set, imported into a cleaned table. Still, the distinctness of these columns will be low -- on the order of one distinct value per 10^4 rows. The others have levels of distinctness proportional to what the above chart shows -- id_response is unique, id_topic will have 4-8 values per 10^4 records, and so on. And still, none of this explains to me why the index on id_item drove the COPY cost up so apparently dramatically. I tried the COPY again both with and without that one index and in both cases it caused COPY to go from 5 to 40+ seconds. -- sgl ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html