On Sat, 2005-08-20 at 11:59 -0400, Ron wrote:
> At 04:11 PM 8/19/2005, Jeremiah Jahn wrote:
> >On Fri, 2005-08-19 at 14:23 -0500, John A Meinel wrote:
> > > Ron wrote:
> > > > At 01:18 PM 8/19/2005, John A Meinel wrote:
> > > >
> > > >> Jeremiah Jahn wrote:
> > > >> > Sorry about the formatting.
> > > >> >
> > > >> > On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> > > >> >
> > > >> >>Jeremiah Jahn wrote:
> > > >> >>
> > > >> >>
> > > >>
> > > >> ...
> > > >>
> > > >> >>The expensive parts are the 4915 lookups into the litigant_details
> > > >> (each
> > > >> >>one takes approx 4ms for a total of ~20s).
> > > >> >>And then you do it again on case_data (average 3ms each * 4906 loops 
> > > >> >>=
> > > >> >>~15s).
> > > >> >
> > > >> > Is there some way to avoid this?
> > > >> >
> > > >>
> > > >> Well, in general, 3ms for a single lookup seems really long. Maybe your
> > > >> index is bloated by not vacuuming often enough. Do you tend to get a 
> > > >> lot
> > > >> of updates to litigant_details?
> > > >
> > > >
> > > > Given that the average access time for a 15Krpm HD is in the 5.5-6ms
> > > > range (7.5-8ms for a 10Krpm HD), having an average of 3ms for a single
> > > > lookup implies that ~1/2 (the 15Krpm case) or ~1/3 (the 10Krpm case)
> > > > table accesses is requiring a seek.
> > > >
> >I think LVM may be a problem, since it also seems to break things up on
> >the file system. My access time on the seek should be around 1/7th the
> >15Krpm I believe since it's a 14 disk raid 10 array. And no other
> >traffic at the moment.
> 
> Oops.  There's a misconception here.  RAID arrays increase 
> _throughput_ AKA _bandwidth_ through parallel access to HDs.  OTOH, 
> access time is _latency_, and that is not changed.  Access time for a 
> RAID set is equal to that of the slowest access time, AKA highest 
> latency, HD in the RAID set.

so I will max out at the 5.5-6ms rang for access time?


> 
> > > Well, from what he has said, the total indexes are < 1GB and he has 6GB
> > > of ram. So everything should fit. Not to mention he is only accessing
> > > 5000/several million rows.
> >I table spaced some of the indexes and they are around 211066880 bytes
> >for the name_speed index and 149825330 for the lit_actor_speed index
> >tables seem to be about a gig.
> 
> Hmm.  And you think you are only using 250MB out of your 6GB of 
> RAM?  Something doesn't seem to add up here.  From what's been 
> posted, I'd expect much more RAM to be in use.

the cached memory usage is complete using up the rest of the memory. 

> 
> 
> > > > This implies a poor match between physical layout and access pattern.
> > >
> > > This seems to be the case. But since this is not the only query, it may
> > > be that other access patterns are more important to optimize for.
> > >
> > > >
> > > > If I understand correctly, the table should not be very fragmented given
> > > > that this is a reasonably freshly loaded DB?  That implies that the
> > > > fields being looked up are not well sorted in the table compared to the
> > > > query pattern.
> > > >
> > > > If the entire table could fit in RAM, this would be far less of a
> > > > consideration.  Failing that, the physical HD layout has to be improved
> > > > or the query pattern has to be changed to reduce seeks.
> > > >
> > > >
> > >
> > > ...
> > >
> > > >> After CLUSTER, the current data will stay clustered, but new data will
> > > >> not, so you have to continually CLUSTER, the same way that you might
> > > >> VACUUM. *However*, IIRC CLUSTER grabs an Exclusive lock, so it is as
> > > >> expensive as a VACUUM FULL. Be aware of this, but it might vastly
> > > >> improve your performance, so it would be worth it.
> > > >
> > > >
> > > > CLUSTER can be a very large maintenance overhead/problem if the table(s)
> > > > in question actually need to be "continually" re CLUSTER ed.
> > > >
> > > > If there is no better solution available, then you do what you have to,
> > > > but it feels like there should be a better answer here.
> > > >
> > > > Perhaps the DB schema needs examining to see if it matches up well with
> > > > its real usage?
> > > >
> > > > Ron Peacetree
> > > >
> > >
> > > I certainly agree that CLUSTER is expensive, and is an on-going
> > > maintenance issue. If it is the normal access pattern, though, it may be
> > > worth it.
> >
> >The query I've sent you is one of the most common I get just change the
> >name. I handle about 180K of them a day mostly between 8 and 5. The
> >clustering has never really been a problem. Like I said before I do it
> >about once a week. I handle about 3000 update an hour consisting of
> >about 1000-3000 statement per update. ie about 2.5 million updates per
> >hour. In the last few months or so I've filtered these down to about
> >400K update/delete/insert statements per hour.
> 
> 2.5M updates per hour = ~695 updates per second.  400K per hour = 
> ~112 updates per sec.  These should be well within the capabilities 
> of a RAID 10 subsystem based on 14 15Krpm HDs assuming a decent RAID 
> card.  What is the exact HW of the RAID subsystem involved and how is 
> it configured?  You shouldn't be having a performance problem AFAICT...

dell perc4 with 14 drives and the each pair is raid 1 with spanning
enabled across all of the pairs. It doesn't say raid 10...But it seem to
be it. What else would you like to know?

> 
> > > I also wonder, though, if his table is properly normalized. 
> > Which, as you mentioned, might lead to improved access patterns.
> >The system is about as normalized as I can get it. In general the 
> >layout is the following:
> >courts have cases, cases have litigant_details. Actors have 
> >identities and litigant_details.
> 
> Hmmm.  Can you tell us more about the actual schema, I may have an idea...
In what format would you like it. What kind of things would you like to
know..? I've probably missed a few things, but this is what running on
the production box. There are no foreign keys. Cascading delete were far
too slow. And having to determine the order of deletes was a pain in the
but. 



CREATE TABLE actor (
    actor_id character varying(50) NOT NULL,
    case_id character varying(50) DEFAULT '0'::character varying NOT NULL,
    court_ori character varying(18) NOT NULL,
    role_class_code character varying(50) NOT NULL
);



CREATE TABLE identity (
    identity_id character varying(50) NOT NULL,
    actor_id character varying(50) NOT NULL,
    case_id character varying(50) DEFAULT '0'::character varying NOT NULL,
    court_ori character varying(18) NOT NULL,
    identity_type character varying(10) NOT NULL,
    entity_type character varying(50),
    full_name character varying(60) NOT NULL,
    entity_acronym character varying(50),
    name_prefix character varying(50),
    first_name character varying(50),
    middle_name character varying(50),
    last_name character varying(50),
    name_suffix character varying(50),
    gender_code character varying(50),
    date_of_birth date,
    place_of_birth character varying(50),
    height character varying(50),
    height_unit character varying(50),
    weight character varying(50),
    weight_unit character varying(50),
    religion character varying(50),
    ethnicity character varying(50),
    citizenship_country character varying(50),
    hair_color character varying(50),
    eye_color character varying(50),
    scars_marks_tatto character varying(255),
    marital_status character varying(50)
);
ALTER TABLE ONLY identity ALTER COLUMN full_name SET STATISTICS 1000;



CREATE TABLE case_data (
    case_id character varying(50) NOT NULL,
    court_ori character varying(18) NOT NULL,
    type_code character varying(50),
    subtype_code character varying(50),
    case_category character varying(50),
    case_title character varying(100),
    type_subtype_text character varying(255),
    case_year integer,
    extraction_datetime character varying(15) NOT NULL,
    update_date date NOT NULL,
    case_dom oid,
    data bytea
);



CREATE TABLE litigant_details (
    actor_id character varying(50) NOT NULL,
    case_id character varying(50) NOT NULL,
    court_ori character varying(18) NOT NULL,
    assigned_case_role character varying(50) NOT NULL,
    initial_file_date date,
    initial_close_date date,
    reopen_date date,
    reclose_date date,
    physical_file_location character varying(50),
    impound_litigant_data character varying(50),
    impound_litigant_minutes character varying(50),
    actor_type character varying(50) NOT NULL,
    conviction character varying(3)
);



CREATE TABLE actor_identifier (
    identity_id character varying(50) NOT NULL,
    actor_id character varying(50) NOT NULL,
    case_id character varying(50) DEFAULT '0'::character varying NOT NULL,
    court_ori character varying(18) NOT NULL,
    actor_identifier_type_code character varying(50) NOT NULL,
    actor_identifier_id character varying(50) NOT NULL
);



CREATE TABLE actor_relationship (
    litigant_actor_id character varying(50) NOT NULL,
    related_actor_id character varying(50) NOT NULL,
    case_id character varying(50) NOT NULL,
    court_ori character varying(18) NOT NULL,
    relationship_type character varying(50) NOT NULL
);

CREATE INDEX lit_actor_speed ON litigant_details USING btree (actor_id);

CREATE INDEX name_speed ON identity USING btree (full_name);
ALTER TABLE identity CLUSTER ON name_speed;

CREATE INDEX case_speed ON case_data USING btree (court_ori, case_id);
ALTER TABLE case_data CLUSTER ON case_speed;


ALTER TABLE ONLY actor
    ADD CONSTRAINT actor_pkey PRIMARY KEY (court_ori, case_id, actor_id);
ALTER TABLE ONLY identity
    ADD CONSTRAINT identity_pkey PRIMARY KEY (court_ori, case_id, identity_id, 
actor_id);
ALTER TABLE ONLY case_data
    ADD CONSTRAINT case_data_pkey PRIMARY KEY (court_ori, case_id);
ALTER TABLE ONLY litigant_details
    ADD CONSTRAINT litigant_details_pkey PRIMARY KEY (actor_id, case_id, 
court_ori);



> 
> >  >
> > > John
> > > =:->
> >--
> >Speak softly and carry a +6 two-handed sword.
> 
> Nah.  A wand of 25th level automatic Magic Missile Fire ;-)
> 
> Ron Peacetree
> 
-- 
Speak softly and carry a +6 two-handed sword.


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to