On Sun, 24 Oct 2004, John Meinel wrote:
I was looking into another problem, and I found something that surprised me. If I'm doing "SELECT * FROM mytable WHERE col = 'myval' LIMIT 1.". Now "col" is indexed... The real purpose of this query is to check to see if a value exists in the column,...
When you select all the columns, you're going to force it to go to the table. If you select only the indexed column, it ought to be able to use just the index, and never read the table at all. You could also use more standard and more set-oriented SQL while you're at it:
SELECT DISTINCT(col) FROM mytable WHERE col = 'myval'
Well, what you wrote was actually much slower, as it had to scan the whole table, grab all the rows, and then distinct them in the end.
However, this query worked:
SELECT DISTINCT(col) FROM mytable WHERE col = 'myval' LIMIT 1;
Now, *why* that works differently from:
SELECT col FROM mytable WHERE col = 'myval' LIMIT 1; or SELECT DISTINCT(col) FROM mytable WHERE col = 'myval';
I'm not sure. They all return the same information.
What's also weird is stuff like: SELECT DISTINCT(NULL) FROM mytable WHERE col = 'myval' LIMIT 1;
Also searches the entire table, sorting that NULL == NULL wherever col = 'myval'. Which is as expensive as the non-limited case (I'm guessing that the limit is occurring after the distinct, which is causing the problem. SELECT NULL FROM ... still uses a sequential scan, but it stops after finding the first one.)
Actually, in doing a little bit more testing, the working query only works on some of the values. Probably it just increases the expense enough that it switches over. It also has the downside that when it does switch to seq scan, it is much more expensive as it has to do a sort and a unique on all the entries.
Description: OpenPGP digital signature