On 2/6/04 12:30 AM, "Tom Lane" <[EMAIL PROTECTED]> wrote:

> Steve Lane <[EMAIL PROTECTED]> writes:
>> When I drop and rebuild the indexes, they take oddly varying amounts of time
>> to rebuild. I rebuilt them in the following order, with the following rough
>> times. I took a guess that the speed of the rebuild might be related to the
>> number of distinct values in the column -- this seems true in some cases but
>> not in others. Here are the times:
> 
>> id_response       38 secs (86000 distinct)
>> id_topic          33 secs (6 distinct)
>> id_survey         13 secs (1 distinct)
>> id_code           39 secs (1444 distinct)
>> id_item           40 secs (65 distinct)
>> id_administration 13 secs (1 distinct)
> 
> How many rows altogether in this table?  What sort_mem are you using
> (and have you tried altering it)?

Sorry, left out important info. I cleaned the table first (it's a temporary
copy), then imported one run of data, 86,000 rows.

I haven't checked sort_mem, will do so.
> 
> When I read your previous message I was wondering about different
> datatypes having different comparison costs, but since these are all
> integer columns that's clearly not the issue here.  I think you may
> be seeing something related to the number of initial runs created in the
> sorting algorithm --- with only one distinct value, there'd always be
> just one run and no need for any merge passes.
> 
> (Why are you bothering to index columns with only one or a few values in
> them, anyway?  Such an index will be useless for searches ...)

Again, I left out some useful details. Id_survey and id_administration will
be identical for a single run of data (which can generate 10-100K inserts).
And my tests were just one data set, imported into a cleaned table. Still,
the distinctness of these columns will be low -- on the order of one
distinct value per 10^4 rows. The others have levels of distinctness
proportional to what the above chart shows -- id_response is unique,
id_topic will have 4-8 values per 10^4 records, and so on.

And still, none of this explains to me why the index on id_item drove the
COPY cost up so apparently dramatically. I tried the COPY again both with
and without that one index and in both cases it caused COPY to go from 5 to
40+ seconds.

-- sgl


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faqs/FAQ.html

Reply via email to