Re: [PERFORM] Techniques to Avoid Temp Files
On Thu, Jun 18, 2015 at 12:38 PM, Duane Murphy wrote: > We are trying to improve performance by avoiding the temp file creation. > > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size > 58988604 > STATEMENT: SELECT iiid.installed_item__id, item_detail.id, > item_detail.model_id, item_detail.type > FROM installed_item__item_detail AS iiid > INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id > INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id ) > INNER JOIN model ON (item.id = model.item__id AND model.id = $1) > > Our hypothesis is that the temp file creation is caused by the high row > count of the > installed_item__item_detail table. > > installed_item__item_detail: 72916824 rows (27 GB) > item_detail: 59212436 rows (40 GB) > > The other two tables, item and model, are temporary tables created during > this particular process. Unfortunately, I don't have those table sizes. > Those temporary tables aren't providing any output to the query, so their only role must be to restrict the rows returned by the permanent tables. If they restrict that by a lot, then it could do a nested loop over the temp tables, doing indexed queries against the permanent tables assuming you have the right indexes. Temporary tables do not get analyzed automatically, so you should probably run ANALYZE on them explicitly before this big query. > > What additional information can I gather in order have a better > understanding of how to improve this query? > What indexes do the tables have? What is the output of EXPLAIN, or better yet EXPLAIN (ANALYZE,BUFFERS), for the query? Cheers, Jeff
Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)
-Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman Sent: Friday, June 19, 2015 11:07 AM To: Ian Pushee; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type) -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee Sent: Friday, June 19, 2015 10:54 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type) On 6/19/2015 10:46 AM, Igor Neyman wrote: > > Probably events_confidnce index is not very selective, that's why optimizer > prefers seq scan. > I'd try to create an index on (name, eventspy_id, camera_id, type, status). > > Also, the recent 9.2 is 9.2.13, you should upgrade. > > Regards, > Igor Neyman Hi Igor, I already have an index for (name, eventspy_id, camera_id, type, status)... that is the index being used (apparently silently) when I set random_page_cost=1.0. Thanks, -Ian -- Well, having 8GB Ram on the machine you probably should not be using default config parameters. Depending on what else is this machine is being used for, and depending on queries you are running, you should definitely modify Postgres config. If this machine is designated database server, I'd start with the following parameters modified from default values: shared_buffers = 1024MB temp_buffers = 8MB work_mem = 64MB effective_cache_size = 1024MB random_page_cost = 2.5 cpu_tuple_cost = 0.03 cpu_index_tuple_cost = 0.05 and see how it goes. Regards, Igor Neyman --- Oops, should be at least: effective_cache_size = 5120MB on dedicated server. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)
-Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee Sent: Friday, June 19, 2015 10:54 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type) On 6/19/2015 10:46 AM, Igor Neyman wrote: > > Probably events_confidnce index is not very selective, that's why optimizer > prefers seq scan. > I'd try to create an index on (name, eventspy_id, camera_id, type, status). > > Also, the recent 9.2 is 9.2.13, you should upgrade. > > Regards, > Igor Neyman Hi Igor, I already have an index for (name, eventspy_id, camera_id, type, status)... that is the index being used (apparently silently) when I set random_page_cost=1.0. Thanks, -Ian -- Well, having 8GB Ram on the machine you probably should not be using default config parameters. Depending on what else is this machine is being used for, and depending on queries you are running, you should definitely modify Postgres config. If this machine is designated database server, I'd start with the following parameters modified from default values: shared_buffers = 1024MB temp_buffers = 8MB work_mem = 64MB effective_cache_size = 1024MB random_page_cost = 2.5 cpu_tuple_cost = 0.03 cpu_index_tuple_cost = 0.05 and see how it goes. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)
On 6/19/2015 10:47 AM, Andreas Kretschmer wrote: Explain Analyze outputs (links as requested): Default plan: http://explain.depesz.com/s/ib3k Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM All pgsql settings are at their defaults. increase work_mem. per session via set work_mem = 'xxxMB'; or in postgresql.conf, reload. Hi Andreas, The number of rows in the events table isn't constrained, so unfortunately it isn't feasible to set work_mem high enough to allow an in-memory sort. Forcing the planner to use the index works to produce a fast query, so I'm wondering if there is a more general way to getting the planner to take into account that work_mem isn't big enough to fit the query which will result in a MUCH more costly external merge. Thanks, -Ian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)
On 6/19/2015 10:46 AM, Igor Neyman wrote: Probably events_confidnce index is not very selective, that's why optimizer prefers seq scan. I'd try to create an index on (name, eventspy_id, camera_id, type, status). Also, the recent 9.2 is 9.2.13, you should upgrade. Regards, Igor Neyman Hi Igor, I already have an index for (name, eventspy_id, camera_id, type, status)... that is the index being used (apparently silently) when I set random_page_cost=1.0. Thanks, -Ian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)
-Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Ian Pushee Sent: Friday, June 19, 2015 10:34 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Slow query (planner insisting on using 'external merge' sort type) Hi Folks, This is my first time posting here, so hopefully I manage to convey all the information needed. We have a simple query that just started giving us problems in production when the number of rows gets too large (>100k). The issue seems to be that the planner wants to sort the rows using a sequential scan, rather than the index provided specifically for this query. This isn't a problem with low numbers of rows, but eventually the query outgrows work_mem and uses the disk, slowing does the query greatly. I know the common answer is to increase work_mem... but since this tables growth is unpredictable, that isn't a viable strategy. I've tried increasing shared_buffers and effective_cache_size, but that doesn't appear to effect the plan chosen here. Setting random_page_cost=1.0 works, but I'm hoping for a more general solution that doesn't require setting that locally each time I run the query. I guess my real question is wether or not there is any way to get the planner to take into account the fact that it's going to need to do an 'external merge', and that it is going to take a LONG time? Table and Index Schemas: CREATE TABLE events ( id serial NOT NULL, name character varying(64), eventspy_id integer NOT NULL, camera_id integer NOT NULL, start_time timestamp without time zone NOT NULL, millisecond smallint NOT NULL, uid smallint NOT NULL, update_time timestamp without time zone NOT NULL DEFAULT now(), length integer NOT NULL, objects text NOT NULL, priority smallint NOT NULL, type character varying(45) NOT NULL DEFAULT 'alarm'::character varying, status event_status NOT NULL DEFAULT 'new'::event_status, confidence smallint NOT NULL DEFAULT 100::smallint, CONSTRAINT events_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE ); CREATE INDEX events_confidnce ON events USING btree (confidence); CREATE INDEX events_summary ON events USING btree (name COLLATE pg_catalog."default", eventspy_id, camera_id, type COLLATE pg_catalog."default", status); Query: SELECT name, type, eventspy_id, camera_id, status, COUNT(id), MAX(update_time), MIN(start_time), MAX(start_time) FROM events WHERE confidence>=0 GROUP BY name, eventspy_id, camera_id, type, status; Explain Analyze outputs (links as requested): Default plan: http://explain.depesz.com/s/ib3k Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM All pgsql settings are at their defaults. Thanks for any help you can provide, -Ian Pushee --- Probably events_confidnce index is not very selective, that's why optimizer prefers seq scan. I'd try to create an index on (name, eventspy_id, camera_id, type, status). Also, the recent 9.2 is 9.2.13, you should upgrade. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow query (planner insisting on using 'external merge' sort type)
> Explain Analyze outputs (links as requested): > Default plan: http://explain.depesz.com/s/ib3k > Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP > > Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM > All pgsql settings are at their defaults. increase work_mem. per session via set work_mem = 'xxxMB'; or in postgresql.conf, reload. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow query (planner insisting on using 'external merge' sort type)
Hi Folks, This is my first time posting here, so hopefully I manage to convey all the information needed. We have a simple query that just started giving us problems in production when the number of rows gets too large (>100k). The issue seems to be that the planner wants to sort the rows using a sequential scan, rather than the index provided specifically for this query. This isn't a problem with low numbers of rows, but eventually the query outgrows work_mem and uses the disk, slowing does the query greatly. I know the common answer is to increase work_mem... but since this tables growth is unpredictable, that isn't a viable strategy. I've tried increasing shared_buffers and effective_cache_size, but that doesn't appear to effect the plan chosen here. Setting random_page_cost=1.0 works, but I'm hoping for a more general solution that doesn't require setting that locally each time I run the query. I guess my real question is wether or not there is any way to get the planner to take into account the fact that it's going to need to do an 'external merge', and that it is going to take a LONG time? Table and Index Schemas: CREATE TABLE events ( id serial NOT NULL, name character varying(64), eventspy_id integer NOT NULL, camera_id integer NOT NULL, start_time timestamp without time zone NOT NULL, millisecond smallint NOT NULL, uid smallint NOT NULL, update_time timestamp without time zone NOT NULL DEFAULT now(), length integer NOT NULL, objects text NOT NULL, priority smallint NOT NULL, type character varying(45) NOT NULL DEFAULT 'alarm'::character varying, status event_status NOT NULL DEFAULT 'new'::event_status, confidence smallint NOT NULL DEFAULT 100::smallint, CONSTRAINT events_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE ); CREATE INDEX events_confidnce ON events USING btree (confidence); CREATE INDEX events_summary ON events USING btree (name COLLATE pg_catalog."default", eventspy_id, camera_id, type COLLATE pg_catalog."default", status); Query: SELECT name, type, eventspy_id, camera_id, status, COUNT(id), MAX(update_time), MIN(start_time), MAX(start_time) FROM events WHERE confidence>=0 GROUP BY name, eventspy_id, camera_id, type, status; Explain Analyze outputs (links as requested): Default plan: http://explain.depesz.com/s/ib3k Forced index (random_page_cost=1.0): http://explain.depesz.com/s/lYaP Software/Hardware: PGSql 9.2.1, Windows 8.1, 8GB RAM All pgsql settings are at their defaults. Thanks for any help you can provide, -Ian Pushee -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Techniques to Avoid Temp Files
Duane Murphy wrote: > We are trying to improve performance by avoiding the temp file creation. > > LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp8068.125071", size > 58988604 > STATEMENT: SELECT iiid.installed_item__id, item_detail.id, > item_detail.model_id, item_detail.type > FROM installed_item__item_detail AS iiid > INNER JOIN item_detail ON iiid.item_detail__id = item_detail.id > INNER JOIN item ON (item.installed_item__id = iiid.installed_item__id ) > INNER JOIN model ON (item.id = model.item__id AND model.id = $1) > What are the causes of temp file creation? Operations like hash and sort that need more space than work_mem promises. > What additional information can I gather in order have a better understanding > of how to improve this > query? It woul be really useful to see the result of "EXPLAIN (ANALYZE, BUFFERS) SELECT ..." for your query. But essentially the answer to avoid temporary files is always "increase work_mem". Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance