After about 40 inutes the slow query finally finished and the result of the
EXPLAIN plan can be found here:

https://explain.depesz.com/s/BX22

Thanks,
Alessandro Ferrucci

On Tue, Apr 25, 2017 at 11:10 PM, Alessandro Ferrucci <
alessandroferru...@gmail.com> wrote:

> Hello - I am migrating a current system to PostgreSQL and I am having an
> issue with a relatively straightforward query being extremely slow.
>
> The following are the definitions of the tables:
>
> CREATE TABLE popt_2017.unit
> (
>   id serial NOT NULL,
>   unit_id text,
>   batch_id text,
>   create_date timestamp without time zone DEFAULT now(),
>   update_date timestamp without time zone,
>   CONSTRAINT unit_pkey PRIMARY KEY (id)
> )
> WITH (
>   OIDS=FALSE
> );
>
> CREATE TABLE popt_2017.field
> (
>   id serial NOT NULL,
>   unit_id integer,
>   subunit_data_id integer,
>   field_name character varying(50),
>   page_id character varying(20),
>   page_type character varying(20),
>   batch_id character varying(20),
>   file_name character varying(20),
>   data_concept integer,
>   "GROUP" integer,
>   omr_group integer,
>   pres integer,
>   reg_data text,
>   ocr_conf text,
>   ocr_dict text,
>   ocr_phon text,
>   create_date timestamp without time zone DEFAULT now(),
>   update_date timestamp without time zone,
>   CONSTRAINT field_pkey PRIMARY KEY (id),
>   CONSTRAINT field_subunit_data_id_fkey FOREIGN KEY (subunit_data_id)
>       REFERENCES popt_2017.subunit (id) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT field_unit_id_fk FOREIGN KEY (unit_id)
>       REFERENCES popt_2017.unit (id) MATCH FULL
>       ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT field_unit_id_fkey FOREIGN KEY (unit_id)
>       REFERENCES popt_2017.unit (id) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
>   OIDS=FALSE
> );
>
> CREATE TABLE popt_2017.answer
> (
>   id serial NOT NULL,
>   field_id integer,
>   ans_status integer,
>   ans text,
>   luggage text,
>   arec text,
>   kfi_partition integer,
>   final boolean,
>   length integer,
>   create_date timestamp without time zone DEFAULT now(),
>   update_date timestamp without time zone,
>   CONSTRAINT answer_pkey PRIMARY KEY (id),
>   CONSTRAINT answer_field_id_fk FOREIGN KEY (field_id)
>       REFERENCES popt_2017.field (id) MATCH FULL
>       ON UPDATE NO ACTION ON DELETE NO ACTION,
>   CONSTRAINT answer_field_id_fkey FOREIGN KEY (field_id)
>       REFERENCES popt_2017.field (id) MATCH SIMPLE
>       ON UPDATE NO ACTION ON DELETE NO ACTION
> )
> WITH (
>   OIDS=FALSE
> );
>
> Below are the index definitions for those tables:
>
> UNIT:
> CREATE UNIQUE INDEX unit_pkey ON unit USING btree (id);
> CREATE INDEX unit_unit_id_idx ON unit USING btree (unit_id);
>
> FIELD:
> CREATE UNIQUE INDEX field_pkey ON field USING btree (id)
> CREATE INDEX field_unit_id_idx ON field USING btree (unit_id)
> CREATE INDEX field_subunit_id_idx ON field USING btree (subunit_data_id)
> CREATE INDEX field_field_name_idx ON field USING btree (field_name)
>
> ANSWER:
> CREATE UNIQUE INDEX answer_pkey ON answer USING btree (id)
> CREATE INDEX answer_field_id_idx ON answer USING btree (field_id)
> CREATE INDEX answer_ans_idx ON answer USING btree (ans)
>
> The tables each have the following number of rows:
>
> UNIT: 10,315
> FIELD: 139,397,965
> ANSWER: 3,463,300
>
> The query in question is:
>
> SELECT
>    UNIT.ID AS UNIT_ID,
>    UNIT.UNIT_ID AS UNIT_UNIT_ID,
>    UNIT.BATCH_ID AS UNIT_BATCH_ID,
>    UNIT.CREATE_DATE AS UNIT_CREATE_DATE,
>    UNIT.UPDATE_DATE AS UNIT_UPDATE_DATE
> FROM
>    UNIT, FIELD, ANSWER
> WHERE
>    UNIT.ID=FIELD.UNIT_ID AND
>    FIELD.ID=ANSWER.FIELD_ID AND
>    FIELD.FIELD_NAME='SHEETS_PRESENT' AND
>    ANSWER.ANS='2';
>
> I attempted to run an EXPLAIN (ANALYZE,BUFFERS) and the query has been
> running for 32 minutes now, So I won't be able to post the results (as I've
> never been able to get the query to actually finish.
>
> But, if I remove the join to UNIT (and just join FIELD and ANSWER) the
> resulting query is sufficiently fast, (the first time it ran in roughly 3
> seconds), the query as such is:
>
> SELECT * FROM
>    ANSWER, FIELD
> WHERE
>    FIELD.ID=ANSWER.FIELD_ID AND
>    FIELD.FIELD_NAME='SHEETS_PRESENT' AND
>    ANSWER.ANS='2';
>
> The EXPLAIN ( ANALYZE, BUFFERS ) output of that query can be found here
> https://explain.depesz.com/s/ueJq
>
> These tables are static for now, so they do not get DELETEs or INSERTS at
> all and I have run VACUUM ANALYZE on all the affected tables.
>
> I'm running PostgreSQL PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu,
> compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-52), 64-bit
>
> I'm running this on RHEL 6.9
>
> On a server with 32 GB of ram, 2 CPUs.
>
> The following are the changes to postgresql.conf that I have made:
>
> shared_buffers = 7871MB
> effective_cache_size = 23611MB
> work_mem = 1000MB
> maintenance_work_mem = 2048MB
>
> I have not changed the autovacuum settings, but since the tables are
> static for now and I've already ran VACUUM that should not have any effect.
>
> Any assistance that could be provided is greatly appreciated.
>
> Thank you,
> Alessandro Ferrucci
>
>
>
>
>
>
>


-- 
Signed,
Alessandro Ferrucci

Reply via email to