At 01:55 PM 8/18/2005, John Arbash Meinel wrote:
Jeremiah Jahn wrote:
>here's an example standard query. Ireally have to make the first hit go
>faster. The table is clustered as well on full_name as well. 'Smith%'
>took 87 seconds on the first hit. I wonder if I set up may array wrong.
>I remeber see something about DMA access versus something else, and
>choose DMA access. LVM maybe?
>
>
It would be nice if you would format your queries to be a little bit
easier to read before posting them.
However, I believe I am reading it correctly, to say that the index scan
on identity is not your slow point. In fact, as near as I can tell, it
only takes 52ms to complete.
The expensive parts are the 4915 lookups into the litigant_details (each
one takes approx 4ms for a total of ~20s).
And then you do it again on case_data (average 3ms each * 4906 loops =
~15s).
How big are litigant_details and case_data? If they can fit in RAM,
preload them using methods like the "cat to /dev/null" trick and
those table lookups will be ~100-1000x faster. If they won't fit
into RAM but the machine can be expanded to hold enough RAM to fit
the tables, it's well worth the ~$75-$150/GB to upgrade the server so
that the tables will fit into RAM.
If they can't be made to fit into RAM as atomic entities, you have a
few choices:
A= Put the data tables and indexes on separate dedicated spindles and
put litigant_details and case_data each on their own dedicated
spindles. This will lower seek conflicts. Again it this requires
buying some more HDs, it's well worth it.
B= Break litigant_details and case_data into a set of smaller tables
(based on something sane like the first n characters of the primary key)
such that the smaller tables easily fit into RAM. Given that you've
said only 10GB/60GB is "hot", this could work very well. Combine it
with "A" above (put all the litigant_details sub tables on one
dedicated spindle set and all the case_data sub tables on another
spindle set) for added oomph.
C= Buy a SSD big enough to hold litigant_details and case_data and
put them there. Again, this can be combined with "A" and "B" above
to lessen the size of the SSD needed.
So there is no need for preloading your indexes on the identity
table. It is definitely not the bottleneck.
So a few design bits, which may help your database. Why is
"actor_id" a text field instead of a number?
You could try creating an index on "litigant_details (actor_id,
count_ori)" so that it can do just an index lookup, rather than an
index+ filter.
Yes, that certainly sounds like it would be more efficient.
More importantly, though, the planner seems to think the join of
identity to litigant_details will only return 1 row, not 5000.
Do you regularly vacuum analyze your tables?
Just as a test, try running:
set enable_nested_loop to off;
And then run EXPLAIN ANALYZE again, just to see if it is faster.
You probably need to increase some statistics targets, so that the
planner can design better plans.
> -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) (actual
time=12.891..38317.017 rows=4906 loops=1)
> -> Nested Loop (cost=0.00..20406.48 rows=1 width=159)(actual
time=12.826..23232.106 rows=4906 loops=1)
> -> Nested Loop (cost=0.00..20403.18 rows=1 width=138)
(actual time=12.751..22885.439 rows=4906 loops=1)
> Join Filter: (("outer".case_id)::text =
("inner".case_id)::text)
> -> Index Scan using name_speed on
identity (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538
> rows=4915 loops=1)
> Index Cond: (((full_name)::text >=
'MILLER'::character varying) AND ((full_name)::text <
'MILLES'::character varying))
> Filter: (((court_ori)::text =
'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
> -> Index Scan using lit_actor_speed on
litigant_details (cost=0.00..3.96 rows=1 width=81) (actual
> time=4.631..4.635 rows=1 loops=4915)
> Index Cond: (("outer".actor_id)::text =
(litigant_details.actor_id)::text)
> Filter: ('IL081025J'::text = (court_ori)::text)
> -> Seq Scan on court (cost=0.00..3.29
rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906)
> Filter: ('IL081025J'::text = (id)::text)
> -> Index Scan using case_speed on
case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058
> rows=1 loops=4906)
> Index Cond: (('IL081025J'::text
= (case_data.court_ori)::text) AND ((case_data.case_id)::text =
> ("outer".case_id)::text))
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend