Re: [PERFORM] postgres performance: comparing 2 data centers

Greg Stark Fri, 04 Jun 2004 17:40:29 -0700

"Michael Nonemacher" <[EMAIL PROTECTED]> writes:

> Agreed.
> 
> We originally created the indexes this way because we sometimes do
> searches where one of the columns is constrained using =, and the other
> using a range search, but it's not clear to me how much Postgres
> understands multi-column indexes.  Will I get the gain I'd expect from a
> (member_id, group_id) index on a query like "where member_id = ? and
> group_id > ?"?


It will use them, whether you see a gain depends on the distribution of your
data. Does the group_id > ? exclude enough records that it's worth having to
do all the extra i/o the bigger index would require?

Personally I think the other poster was a bit hasty to assert unconditionally
that it's never worth it. If you have a lot of records for every member_id and
very few of which will be greater than '?' then it might be worth it. If
however you'll only ever have on the order of a hundred or fewer records per
member_id and a significant chunk of them will have group_id > '?' then it
will probably be a wash or worse.

There's another side to the story though. In a web site or other OLTP
application you may find you're better off with the multi-column index. Even
if it performs less well on average than the smaller single column index when
users have reasonable numbers of groups. That's becuase you're guaranteed
(assuming postgres is using it) that even if a user someday has an obscene
number of groups he won't suddenly break your web site by driving your
database into the ground.

There is a difference between latency and bandwidth, and between average and
worst-case. Sometimes it's necessary to keep an eye on worst-case scenarios
and not just average bandwidth. 

But that said. If you are reasonably certain that you'll never or rarely have
thousands of groups per user you're probably better off with the indexes the
other person described.

> I've since found a few other often-used tables where the reltuples
> counts generated by 'analyze' are off by a factor of 5 or more.  In the
> short term, I'm just trying to eliminate the automatic-analyzes where
> possible and make sure they're followed up quickly with a 'vacuum' where
> it's not possible.
>
> Is "analyze generating bad stats" a known issue?  Is there anything I
> could be doing to aggravate or work around the problem?

I would suggest trying a VACUUM FULL and then retrying the ANALYZE. I suspect
you might have a lot of dead tuples at the beginning of your table which is
confusing the sampling. If that's it, then yes it's known and in fact already
improved in what will be 7.5. You may be able to avoid the situation by
vacuuming more frequently.

If that doesn't solve it then I would suggest trying to raise the statistics
targets for the columns in question with 

 ALTER TABLE name ALTER  column SET STATISTICS integer

The default is 100 iirc. You could try 200 or even more.

-- 
greg


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] postgres performance: comparing 2 data centers

Reply via email to