I've been debating with a collegue who argues that indexing a boolean column is a BAD idea and that is will actually slow down queries.
My plan is to have a table with many rows sharing 'versions' (version/archive/history) of data where the most current row is the one where 'is_active' contains a true value.
If the table begins to look like this:
data_id(pk) | data_lookup_key | data_is_active | ... ------------+-----------------+----------------+-------- 1 | banana | false | ... 2 | banana | false | ... 3 | banana | false | ... 4 | banana | false | ... 5 | banana | false | ... 6 | banana | false | ... 7 | banana | false | ... 8 | banana | false | ... 9 | banana | true | ... 10 | apple | true | ... 11 | pear | false | ... 12 | pear | false | ... 13 | pear | false | ... 14 | pear | false | ... 15 | pear | false | ... ... 1000000 | pear | true | ...
Will an index on the 'data_is_active' column be used or work as I expect? I'm assuming that I may have a million entries sharing the same 'data_lookup_key' and I'll be using that to search for the active version of the row.
SELECT * FROM table WHERE data_lookup_key = 'pear' AND data_is_active IS TRUE;
Does it make sense to have an index on data_is_active?
Now, I've read that in some databases the index on a column that
has relatively even distribution of values over a small set of values
will not be efficient.
I bet this is in a FAQ somewhere. Can you point me in the right direction?
Dante
---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives?
http://archives.postgresql.org