"Michael Nonemacher" <[EMAIL PROTECTED]> writes: > Agreed. > > We originally created the indexes this way because we sometimes do > searches where one of the columns is constrained using =, and the other > using a range search, but it's not clear to me how much Postgres > understands multi-column indexes. Will I get the gain I'd expect from a > (member_id, group_id) index on a query like "where member_id = ? and > group_id > ?"?
It will use them, whether you see a gain depends on the distribution of your data. Does the group_id > ? exclude enough records that it's worth having to do all the extra i/o the bigger index would require? Personally I think the other poster was a bit hasty to assert unconditionally that it's never worth it. If you have a lot of records for every member_id and very few of which will be greater than '?' then it might be worth it. If however you'll only ever have on the order of a hundred or fewer records per member_id and a significant chunk of them will have group_id > '?' then it will probably be a wash or worse. There's another side to the story though. In a web site or other OLTP application you may find you're better off with the multi-column index. Even if it performs less well on average than the smaller single column index when users have reasonable numbers of groups. That's becuase you're guaranteed (assuming postgres is using it) that even if a user someday has an obscene number of groups he won't suddenly break your web site by driving your database into the ground. There is a difference between latency and bandwidth, and between average and worst-case. Sometimes it's necessary to keep an eye on worst-case scenarios and not just average bandwidth. But that said. If you are reasonably certain that you'll never or rarely have thousands of groups per user you're probably better off with the indexes the other person described. > I've since found a few other often-used tables where the reltuples > counts generated by 'analyze' are off by a factor of 5 or more. In the > short term, I'm just trying to eliminate the automatic-analyzes where > possible and make sure they're followed up quickly with a 'vacuum' where > it's not possible. > > Is "analyze generating bad stats" a known issue? Is there anything I > could be doing to aggravate or work around the problem? I would suggest trying a VACUUM FULL and then retrying the ANALYZE. I suspect you might have a lot of dead tuples at the beginning of your table which is confusing the sampling. If that's it, then yes it's known and in fact already improved in what will be 7.5. You may be able to avoid the situation by vacuuming more frequently. If that doesn't solve it then I would suggest trying to raise the statistics targets for the columns in question with ALTER TABLE name ALTER column SET STATISTICS integer The default is 100 iirc. You could try 200 or even more. -- greg ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]