On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote:
> Ron wrote:
> > At 01:18 PM 8/19/2005, John A Meinel wrote:
> >> Jeremiah Jahn wrote:
> >> > Sorry about the formatting.
> >> >
> >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> >> >
> >> >>Jeremiah Jahn wrote:
> >> >>
> >> >>
> >> ...
> >> >>The expensive parts are the 4915 lookups into the litigant_details
> >> (each
> >> >>one takes approx 4ms for a total of ~20s).
> >> >>And then you do it again on case_data (average 3ms each * 4906 loops =
> >> >>~15s).
> >> >
> >> > Is there some way to avoid this?
> >> >
> >> Well, in general, 3ms for a single lookup seems really long. Maybe your
> >> index is bloated by not vacuuming often enough. Do you tend to get a lot
> >> of updates to litigant_details?
> > Given that the average access time for a 15Krpm HD is in the 5.5-6ms
> > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single
> > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case)
> > table accesses is requiring a seek.
I think LVM may be a problem, since it also seems to break things up on
the file system. My access time on the seek should be around 1/7th the
15Krpm I believe since it's a 14 disk raid 10 array. And no other
traffic at the moment.
> Well, from what he has said, the total indexes are < 1GB and he has 6GB
> of ram. So everything should fit. Not to mention he is only accessing
> 5000/several million rows.
I table spaced some of the indexes and they are around 211066880 bytes
for the name_speed index and 149825330 for the lit_actor_speed index
tables seem to be about a gig.
> > This implies a poor match between physical layout and access pattern.
> This seems to be the case. But since this is not the only query, it may
> be that other access patterns are more important to optimize for.
> > If I understand correctly, the table should not be very fragmented given
> > that this is a reasonably freshly loaded DB? That implies that the
> > fields being looked up are not well sorted in the table compared to the
> > query pattern.
> > If the entire table could fit in RAM, this would be far less of a
> > consideration. Failing that, the physical HD layout has to be improved
> > or the query pattern has to be changed to reduce seeks.
> >> After CLUSTER, the current data will stay clustered, but new data will
> >> not, so you have to continually CLUSTER, the same way that you might
> >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
> >> expensive as a VACUUM FULL. Be aware of this, but it might vastly
> >> improve your performance, so it would be worth it.
> > CLUSTER can be a very large maintenance overhead/problem if the table(s)
> > in question actually need to be "continually" re CLUSTER ed.
> > If there is no better solution available, then you do what you have to,
> > but it feels like there should be a better answer here.
> > Perhaps the DB schema needs examining to see if it matches up well with
> > its real usage?
> > Ron Peacetree
> I certainly agree that CLUSTER is expensive, and is an on-going
> maintenance issue. If it is the normal access pattern, though, it may be
> worth it.
The query I've sent you is one of the most common I get just change the
name. I handle about 180K of them a day mostly between 8 and 5. The
clustering has never really been a problem. Like I said before I do it
about once a week. I handle about 3000 update an hour consisting of
about 1000-3000 statement per update. ie about 2.5 million updates per
hour. In the last few months or so I've filtered these down to about
400K update/delete/insert statements per hour.
> I also wonder, though, if his table is properly normalized. Which, as
> you mentioned, might lead to improved access patterns.
The system is about as normalized as I can get it. In general the layout
is the following:
courts have cases, cases have litigant_details. Actors have identities
Speak softly and carry a +6 two-handed sword.
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster