I haven't come up with any great ideas for this one. It might be interesting to compare the explain analyze output from the distinct on query with and without seqscans enabled.
After digging through planner code, I found that bumping up the sort_mem will make the planner prefer a full table scan and hashed aggregation. The sort memory is where the hash table is stored. In the end, the query runs in 4.5 minutes, which is reasonable.
I had planned to try Manfred's index correlation patch to see if it would give better estimates for an index scan. The index scan method took maybe 6.5x as long, but the estimate was that it would take 1400x as long. I think instead of trying out his patch I might actually work on my application!
Ken
---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings