[PERFORM] Performance of complicated query

Jonathan Morra Thu, 23 May 2013 10:21:01 -0700

I am fairly new to squeezing performance out of Postgres, but I hope this
mailing list can help me.  I have read the instructions found at
http://wiki.postgresql.org/wiki/Slow_Query_Questions and have tried to
abide by them the best that I can.  I am running "PostgreSQL 9.1.7,
compiled by Visual C++ build 1500, 64-bit" on an x64 Windows 7 Professional
Service Pack 1 machine with 8 GB of RAM.  I installed this using the
downloadable installer.  I am testing this using pgAdminIII but ultimately
this will be deployed within a Rails application.  Here are the values of
some configuration parameters:


shared_buffers = 1GB
temp_buffers = 8MB
work_mem = 10MB
maintenance_work_mem = 256MB
random_page_cost = 1.2
default_statistics_target = 10000

Table schema:

reads-- ~250,000 rows
CREATE TABLE reads
(
  id serial NOT NULL,
  device_id integer NOT NULL,
  value bigint NOT NULL,
  read_datetime timestamp without time zone NOT NULL,
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL,
  CONSTRAINT reads_pkey PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);
ALTER TABLE reads
  OWNER TO postgres;

CREATE INDEX index_reads_on_device_id
  ON reads
  USING btree
  (device_id );

CREATE INDEX index_reads_on_device_id_and_read_datetime
  ON reads
  USING btree
  (device_id , read_datetime );

CREATE INDEX index_reads_on_read_datetime
  ON reads
  USING btree
  (read_datetime );

devices -- ~25,000 rows
CREATE TABLE devices
(
  id serial NOT NULL,
  serial_number character varying(20) NOT NULL,
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL,
  CONSTRAINT devices_pkey PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);
ALTER TABLE devices
  OWNER TO postgres;

CREATE UNIQUE INDEX index_devices_on_serial_number
  ON devices
  USING btree
  (serial_number COLLATE pg_catalog."default" );

patient_devices -- ~25,000 rows
CREATE TABLE patient_devices
(
  id serial NOT NULL,
  patient_id integer NOT NULL,
  device_id integer NOT NULL,
  issuance_datetime timestamp without time zone NOT NULL,
  unassignment_datetime timestamp without time zone,
  issued_value bigint NOT NULL,
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL,
  CONSTRAINT patient_devices_pkey PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);
ALTER TABLE patient_devices
  OWNER TO postgres;

CREATE INDEX index_patient_devices_on_device_id
  ON patient_devices
  USING btree
  (device_id );

CREATE INDEX index_patient_devices_on_issuance_datetime
  ON patient_devices
  USING btree
  (issuance_datetime );

CREATE INDEX index_patient_devices_on_patient_id
  ON patient_devices
  USING btree
  (patient_id );

CREATE INDEX index_patient_devices_on_unassignment_datetime
  ON patient_devices
  USING btree
  (unassignment_datetime );

patients -- ~1000 rows
CREATE TABLE patients
(
  id serial NOT NULL,
  first_name character varying(50) NOT NULL,
  last_name character varying(50) NOT NULL,
  created_at timestamp without time zone NOT NULL,
  updated_at timestamp without time zone NOT NULL,
  CONSTRAINT patients_pkey PRIMARY KEY (id )
)
WITH (
  OIDS=FALSE
);
ALTER TABLE patients
  OWNER TO postgres;

Finally, this is the query I am running:

SELECT first_name, last_name, serial_number, latest_read, value,
lifetime_value, lifetime.patient_id
FROM (
SELECT DISTINCT patient_id, first_name, last_name, MAX(max_read)
OVER(PARTITION BY patient_id) AS latest_read, SUM(value) OVER(PARTITION BY
patient_id) AS value, first_value(serial_number) OVER(PARTITION BY
patient_id ORDER BY max_read DESC) AS serial_number
 FROM (
SELECT patient_id, first_name, last_name, value - issued_value AS value,
serial_number, read_datetime, MAX(read_datetime) OVER (PARTITION BY
patient_devices.id) AS max_read
FROM reads
INNER JOIN devices ON devices.id = reads.device_id
INNER JOIN patient_devices ON patient_devices.device_id = devices.id
AND read_datetime >= issuance_datetime
AND read_datetime < COALESCE(unassignment_datetime , 'infinity'::timestamp)
INNER JOIN patients ON patients.id = patient_devices.patient_id
WHERE read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42'
) AS first WHERE read_datetime = max_read
) AS filtered
INNER JOIN (
SELECT DISTINCT patient_id, SUM(value) AS lifetime_value
 FROM (
SELECT patient_id, value - issued_value AS value, read_datetime,
MAX(read_datetime) OVER (PARTITION BY patient_devices.id) AS max_read
FROM reads
INNER JOIN devices ON devices.id = reads.device_id
INNER JOIN patient_devices ON patient_devices.device_id = devices.id
AND read_datetime >= issuance_datetime
AND read_datetime < COALESCE(unassignment_datetime , 'infinity'::timestamp)
) AS first WHERE read_datetime = max_read GROUP BY patient_id
) AS lifetime ON filtered.patient_id = lifetime.patient_id

The EXPLAIN (ANALYZE, BUFFERS) output can be found at the following link
http://explain.depesz.com/s/7Zr.  Ultimately what I want to do is to find a
sum of values for each patient.  The scenario is that each patient is
assigned a device and they get incremental values on their device.  Since
these values are incremental if a patient never switches devices, the
reported value should be the last value for a patient.  However, if a
patient switches devices then the reported value should be the sum of the
last value for each device that the patient was assigned.  This leads to
the conditions read_datetime >= issuance_datetime AND read_datetime <
COALESCE(unassignment_datetime , 'infinity'::timestamp).  In addition I
must report the serial number of the last device that the patient was
assigned (or is currently assigned).  The only way I could come up with
doing that is first_value(serial_number) OVER(PARTITION BY patient_id ORDER
BY max_read DESC) AS serial_number.  Finally, I must report 2 values, one
with respect to a time range and one which is the lifetime value.  In order
to satisfy this requirement, I have to run essentially the same query twice
(one with the WHERE time clause and one without) and INNER JOIN the
results.  My questions are

1.  Can I make the query as I have constructed it faster by adding indices
or changing any postgres configuration parameters?
2.  Can I modify the query to return the same results in a faster way?
3.  Can I modify my tables to make this query (which is the crux of my
application) run faster?

Thanks

[PERFORM] Performance of complicated query

Reply via email to