At 01:55 PM 8/18/2005, John Arbash Meinel wrote:
Jeremiah Jahn wrote:

>here's an example standard query. Ireally have to make the first hit go
>faster. The table is clustered as well on full_name as well. 'Smith%'
>took 87 seconds on the first hit. I wonder if I set up may array wrong.
>I remeber see something about DMA access versus something else, and
>choose DMA access. LVM maybe?
>
>
It would be nice if you would format your queries to be a little bit
easier to read before posting them.
However, I believe I am reading it correctly, to say that the index scan
on identity is not your slow point. In fact, as near as I can tell, it
only takes 52ms to complete.

The expensive parts are the 4915 lookups into the litigant_details (each
one takes approx 4ms for a total of ~20s).
And then you do it again on case_data (average 3ms each * 4906 loops =
~15s).

How big are litigant_details and case_data? If they can fit in RAM, preload them using methods like the "cat to /dev/null" trick and those table lookups will be ~100-1000x faster. If they won't fit into RAM but the machine can be expanded to hold enough RAM to fit the tables, it's well worth the ~$75-$150/GB to upgrade the server so that the tables will fit into RAM.

If they can't be made to fit into RAM as atomic entities, you have a few choices: A= Put the data tables and indexes on separate dedicated spindles and put litigant_details and case_data each on their own dedicated spindles. This will lower seek conflicts. Again it this requires buying some more HDs, it's well worth it.

B= Break litigant_details and case_data into a set of smaller tables (based on something sane like the first n characters of the primary key) such that the smaller tables easily fit into RAM. Given that you've said only 10GB/60GB is "hot", this could work very well. Combine it with "A" above (put all the litigant_details sub tables on one dedicated spindle set and all the case_data sub tables on another spindle set) for added oomph.

C= Buy a SSD big enough to hold litigant_details and case_data and put them there. Again, this can be combined with "A" and "B" above to lessen the size of the SSD needed.


So there is no need for preloading your indexes on the identity table. It is definitely not the bottleneck.

So a few design bits, which may help your database. Why is "actor_id" a text field instead of a number? You could try creating an index on "litigant_details (actor_id, count_ori)" so that it can do just an index lookup, rather than an index+ filter.

Yes, that certainly sounds like it would be more efficient.


More importantly, though, the planner seems to think the join of identity to litigant_details will only return 1 row, not 5000.
Do you regularly vacuum analyze your tables?
Just as a test, try running:
set enable_nested_loop to off;
And then run EXPLAIN ANALYZE again, just to see if it is faster.

You probably need to increase some statistics targets, so that the
planner can design better plans.

> -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) (actual time=12.891..38317.017 rows=4906 loops=1) > -> Nested Loop (cost=0.00..20406.48 rows=1 width=159)(actual time=12.826..23232.106 rows=4906 loops=1) > -> Nested Loop (cost=0.00..20403.18 rows=1 width=138) (actual time=12.751..22885.439 rows=4906 loops=1) > Join Filter: (("outer".case_id)::text = ("inner".case_id)::text) > -> Index Scan using name_speed on identity (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538
> rows=4915 loops=1)
> Index Cond: (((full_name)::text >= 'MILLER'::character varying) AND ((full_name)::text < 'MILLES'::character varying)) > Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text)) > -> Index Scan using lit_actor_speed on litigant_details (cost=0.00..3.96 rows=1 width=81) (actual
> time=4.631..4.635 rows=1 loops=4915)
> Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text)
>                         Filter: ('IL081025J'::text = (court_ori)::text)
> -> Seq Scan on court (cost=0.00..3.29 rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906)
>                              Filter: ('IL081025J'::text = (id)::text)
> -> Index Scan using case_speed on case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058
> rows=1 loops=4906)
> Index Cond: (('IL081025J'::text = (case_data.court_ori)::text) AND ((case_data.case_id)::text =
 > ("outer".case_id)::text))




---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to