On May 13, 2013, at 4:24 PM, Jeff Janes wrote:
> On Mon, May 13, 2013 at 3:36 PM, Mike McCann <[email protected]> wrote:
>
> Increasing work_mem to 355 MB improves the performance by a factor of 2:
>
> stoqs_march2013_s=# set work_mem='355MB';
> SET
> stoqs_march2013_s=# explain analyze select * from stoqs_measuredparameter
> order by datavalue;
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=422106.15..430560.68 rows=3381814 width=20) (actual
> time=2503.078..2937.130 rows=3381814 loops=1)
> Sort Key: datavalue
> Sort Method: quicksort Memory: 362509kB
> -> Seq Scan on stoqs_measuredparameter (cost=0.00..55359.14 rows=3381814
> width=20) (actual time=0.016..335.745 rows=3381814 loops=1)
> Total runtime: 3094.601 ms
> (5 rows)
>
> I tried changing random_page_cost to from 4 to 1 and saw no change.
>
> I'm wondering now what changes might get this query to run in less than one
> second.
>
>
> I think you are worrying about the wrong thing here. What is a web app going
> to do with 3,381,814 rows, once it obtains them? Your current testing is not
> testing the time it takes to stream that data to the client, or for the
> client to do something meaningful with that data.
>
> If you only plan to actually fetch a few dozen of those rows, then you
> probably need to incorporate that into your test, either by using a LIMIT, or
> by using a mock-up of the actual application to do some timings.
>
> Also, what is the type and collation of the column you are sorting on?
> non-'C' collations of text columns sort about 3 times slower than 'C'
> collation does.
>
>
> If all the data is in memory, then will faster CPU and memory be the things
> that help?
>
> Yes, those would help (it is not clear to me which of the two would help
> more), but I think you need to rethink your design of sending the entire
> database table to the application server for each page-view.
>
>
> Cheers,
>
> Jeff
Hi Jeff,
The datavalue column is double precision:
stoqs_march2013_s=# \d+ stoqs_measuredparameter
Table
"public.stoqs_measuredparameter"
Column | Type | Modifiers
| Storage | Description
----------------+------------------+----------------------------------------------------------------------+---------+-------------
id | integer | not null default
nextval('stoqs_measuredparameter_id_seq'::regclass) | plain |
measurement_id | integer | not null
| plain |
parameter_id | integer | not null
| plain |
datavalue | double precision | not null
| plain |
Indexes:
"stoqs_measuredparameter_pkey" PRIMARY KEY, btree (id)
"stoqs_measuredparameter_measurement_id_parameter_id_key" UNIQUE
CONSTRAINT, btree (measurement_id, parameter_id)
"stoqs_measuredparameter_datavalue" btree (datavalue)
"stoqs_measuredparameter_measurement_id" btree (measurement_id)
"stoqs_measuredparameter_parameter_id" btree (parameter_id)
Foreign-key constraints:
"stoqs_measuredparameter_measurement_id_fkey" FOREIGN KEY (measurement_id)
REFERENCES stoqs_measurement(id) DEFERRABLE INITIALLY DEFERRED
"stoqs_measuredparameter_parameter_id_fkey" FOREIGN KEY (parameter_id)
REFERENCES stoqs_parameter(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no
Thanks for the suggestion and advice to examine the web app performance. We've
actually taken quite a few steps to optimize how the web app works. The example
query I provided is a simple worst-case one that we can use to help us decide
on the proper hardware. An actual query performed by the web app is:
stoqs_march2013_s=# explain analyze SELECT stoqs_measuredparameter.id,
stoqs_march2013_s-# stoqs_parameter.name AS parameter__name,
stoqs_march2013_s-# stoqs_parameter.standard_name AS
parameter__standard_name,
stoqs_march2013_s-# stoqs_measurement.depth AS measurement__depth,
stoqs_march2013_s-# stoqs_measurement.geom AS measurement__geom,
stoqs_march2013_s-# stoqs_instantpoint.timevalue AS
measurement__instantpoint__timevalue,
stoqs_march2013_s-# stoqs_platform.name AS
measurement__instantpoint__activity__platform__name,
stoqs_march2013_s-# stoqs_measuredparameter.datavalue AS datavalue,
stoqs_march2013_s-# stoqs_parameter.units AS parameter__units
stoqs_march2013_s-# FROM stoqs_parameter p1,
stoqs_march2013_s-# stoqs_measuredparameter
stoqs_march2013_s-# INNER JOIN stoqs_measurement ON
(stoqs_measuredparameter.measurement_id = stoqs_measurement.id)
stoqs_march2013_s-# INNER JOIN stoqs_instantpoint ON
(stoqs_measurement.instantpoint_id = stoqs_instantpoint.id)
stoqs_march2013_s-# INNER JOIN stoqs_parameter ON
(stoqs_measuredparameter.parameter_id = stoqs_parameter.id)
stoqs_march2013_s-# INNER JOIN stoqs_activity ON
(stoqs_instantpoint.activity_id = stoqs_activity.id)
stoqs_march2013_s-# INNER JOIN stoqs_platform ON
(stoqs_activity.platform_id = stoqs_platform.id)
stoqs_march2013_s-# INNER JOIN stoqs_measuredparameter mp1 ON
mp1.measurement_id = stoqs_measuredparameter.measurement_id
stoqs_march2013_s-# WHERE (p1.name = 'sea_water_sigma_t')
stoqs_march2013_s-# AND (mp1.datavalue > 25.19)
stoqs_march2013_s-# AND (mp1.datavalue < 26.01)
stoqs_march2013_s-# AND (mp1.parameter_id = p1.id)
stoqs_march2013_s-# AND (stoqs_instantpoint.timevalue <= '2013-03-17
19:05:06'
stoqs_march2013_s(# AND stoqs_instantpoint.timevalue >= '2013-03-17
15:35:13'
stoqs_march2013_s(# AND stoqs_parameter.name IN ('fl700_uncorr')
stoqs_march2013_s(# AND stoqs_measurement.depth >= -1.88
stoqs_march2013_s(# AND stoqs_platform.name IN ('dorado')
stoqs_march2013_s(# AND stoqs_measurement.depth <= 83.57)
stoqs_march2013_s-# ORDER BY stoqs_activity.name ASC,
stoqs_instantpoint.timevalue ASC;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------
Sort (cost=10741.41..10741.42 rows=1 width=1282) (actual
time=770.211..770.211 rows=0 loops=1)
Sort Key: stoqs_activity.name, stoqs_instantpoint.timevalue
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=3002.89..10741.40 rows=1 width=1282) (actual
time=770.200..770.200 rows=0 loops=1)
Hash Cond: (stoqs_instantpoint.activity_id = stoqs_activity.id)
-> Nested Loop (cost=2983.69..10722.19 rows=3 width=954) (actual
time=770.036..770.036 rows=0 loops=1)
-> Nested Loop (cost=2983.69..9617.36 rows=191 width=946)
(actual time=91.369..680.072 rows=20170 loops=1)
-> Hash Join (cost=2983.69..8499.07 rows=193 width=842)
(actual time=91.346..577.633 rows=20170 loops=1)
Hash Cond: (stoqs_measuredparameter.parameter_id =
stoqs_parameter.id)
-> Nested Loop (cost=2982.38..8478.47 rows=4628
width=24) (actual time=91.280..531.408 rows=197746 loops=1)
-> Nested Loop (cost=2982.38..4862.37
rows=512 width=4) (actual time=91.202..116.140 rows=20170 loops=1)
-> Seq Scan on stoqs_parameter p1
(cost=0.00..1.30 rows=1 width=4) (actual time=0.002..0.011 rows=1 loops=1)
Filter: ((name)::text =
'sea_water_sigma_t'::text)
-> Bitmap Heap Scan on
stoqs_measuredparameter mp1 (cost=2982.38..4854.40 rows=534 width=8) (actual
time=91.194..109.846 rows=20170 loop
s=1)
Recheck Cond: ((datavalue >
25.19::double precision) AND (datavalue < 26.01::double precision) AND
(parameter_id = p1.id))
-> BitmapAnd
(cost=2982.38..2982.38 rows=534 width=0) (actual time=90.794..90.794 rows=0
loops=1)
-> Bitmap Index Scan on
stoqs_measuredparameter_datavalue (cost=0.00..259.54 rows=12292 width=0)
(actual time=62.769..62.769
rows=23641 loops=1)
Index Cond:
((datavalue > 25.19::double precision) AND (datavalue < 26.01::double
precision))
-> Bitmap Index Scan on
stoqs_measuredparameter_parameter_id (cost=0.00..2719.38 rows=147035 width=0)
(actual time=27.412..2
7.412 rows=34750 loops=1)
Index Cond:
(parameter_id = p1.id)
-> Index Scan using
stoqs_measuredparameter_measurement_id on stoqs_measuredparameter
(cost=0.00..6.98 rows=7 width=20) (actual time=0.008..0.
017 rows=10 loops=20170)
Index Cond: (measurement_id =
mp1.measurement_id)
-> Hash (cost=1.30..1.30 rows=1 width=826) (actual
time=0.012..0.012 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on stoqs_parameter
(cost=0.00..1.30 rows=1 width=826) (actual time=0.007..0.010 rows=1 loops=1)
Filter: ((name)::text =
'fl700_uncorr'::text)
-> Index Scan using stoqs_measurement_pkey on
stoqs_measurement (cost=0.00..5.78 rows=1 width=116) (actual time=0.004..0.004
rows=1 loops=20170)
Index Cond: (id =
stoqs_measuredparameter.measurement_id)
Filter: ((depth >= (-1.88)::double precision) AND
(depth <= 83.57::double precision))
-> Index Scan using stoqs_instantpoint_pkey on
stoqs_instantpoint (cost=0.00..5.77 rows=1 width=16) (actual time=0.004..0.004
rows=0 loops=20170)
Index Cond: (id = stoqs_measurement.instantpoint_id)
Filter: ((timevalue <= '2013-03-17 19:05:06-07'::timestamp
with time zone) AND (timevalue >= '2013-03-17 15:35:13-07'::timestamp with time
zone))
-> Hash (cost=18.82..18.82 rows=30 width=336) (actual
time=0.151..0.151 rows=7 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Hash Join (cost=1.09..18.82 rows=30 width=336) (actual
time=0.035..0.145 rows=7 loops=1)
Hash Cond: (stoqs_activity.platform_id = stoqs_platform.id)
-> Seq Scan on stoqs_activity (cost=0.00..16.77 rows=177
width=66) (actual time=0.005..0.069 rows=177 loops=1)
-> Hash (cost=1.07..1.07 rows=1 width=278) (actual
time=0.014..0.014 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on stoqs_platform (cost=0.00..1.07
rows=1 width=278) (actual time=0.008..0.012 rows=1 loops=1)
Filter: ((name)::text = 'dorado'::text)
Total runtime: 770.445 ms
(42 rows)
We assume that steps taken to improve the worst-case query scenario will also
improve these kind of queries. If anything above pops out as needing better
planning please let us know that too!
Thanks,
Mike
--
Mike McCann
Software Engineer
Monterey Bay Aquarium Research Institute
7700 Sandholdt Road
Moss Landing, CA 95039-9644
Voice: 831.775.1769 Fax: 831.775.1736 http://www.mbari.org