[PERFORM] Analyze and default_statistics_target
I was under the impression that the default_statistics_target was a percentage of rows to analyze. Maybe this is not the case? I ran an analyze during a quiet point last night and for a few of my large tables, I didn't get what I consider a reasonable sampling of rows. When running with verbose enabled, it appeared that a maximum of 24 rows were being analyzed, including on tables exceeding 4-8mm rows. My default_statistics_target = 80. Shouldn't I be analyzing a larger percentage of these big tables? What is the unit-of-measure used for default_statistics_target? Thanks in advance, AJ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analyze and default_statistics_target
On 21.01.2013 17:29, AJ Weber wrote: I was under the impression that the default_statistics_target was a percentage of rows to analyze. Maybe this is not the case? Nope. I ran an analyze during a quiet point last night and for a few of my large tables, I didn't get what I consider a reasonable sampling of rows. When running with verbose enabled, it appeared that a maximum of 24 rows were being analyzed, including on tables exceeding 4-8mm rows. My default_statistics_target = 80. Shouldn't I be analyzing a larger percentage of these big tables? Analyze only needs a fairly small random sample of the rows in the table to get a picture of what the data looks like. Compare with e.g opinion polls; you only need to sample a few thousand people to get a result with reasonable error bound. That's for estimating the histogram. Estimating ndistinct is a different story, and it's well-known that the estimates of ndistinct are sometimes wildly wrong. What is the unit-of-measure used for default_statistics_target? It's the number of entries stored in the histogram and most-common-values list in pg_statistics. See also http://www.postgresql.org/docs/devel/static/planner-stats.html: The amount of information stored in pg_statistic by ANALYZE, in particular the maximum number of entries in the most_common_vals and histogram_bounds arrays for each column, can be set on a column-by-column basis using the ALTER TABLE SET STATISTICS command, or globally by setting the default_statistics_target configuration variable. The default limit is presently 100 entries. - Heikki -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analyze and default_statistics_target
AJ Weber wrote: What is the unit-of-measure used for default_statistics_target? Number of entries in pg_stats.histogram_bounds orpg_stats.most_common_vals. Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analyze on temp table taking very long
Thanks for the tip. I'll also check in the lock, it's a customer setup and we don't get access to the box very frequently. Also The code was something like this. loop inserting data into the tmptbl analyze tmptbl end loop if I replace this with loop inserting data into the tmptbl end loop analyze It goes through fine. -mridula -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tom Lane Sent: Tuesday, March 22, 2011 3:57 PM To: Mahadevan, Mridula Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Analyze on temp table taking very long Mahadevan, Mridula mridula.mahade...@ironmountain.com writes: This has been running fine for a while on multiple setups, large and small volumes. The setups all have the same hardware configuration. On one particular setup with about 200k records and this analyze runs for 45min and then times out(statement timeout is set to 45 min). typically this takes a few seconds at best. But when I move the analyze outside the loop everything runs fine. Is it actually *running*, as in doing something, or is it just blocked? I can't immediately think of any reason for some other process to have a lock on a temp table that belongs to your process; but it seems unlikely that ANALYZE would randomly take much longer than expected unless something was preventing it from making progress. Look into pg_locks and/or watch the backend with strace next time this happens. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance The information contained in this email message and its attachments is intended only for the private and confidential use of the recipient(s) named above, unless the sender expressly agrees otherwise. Transmission of email over the Internet is not a secure communications medium. If you are requesting or have requested the transmittal of personal data, as defined in applicable privacy laws by means of email or in an attachment to email, you must select a more secure alternate means of transmittal that supports your obligations to protect such personal data. If the reader of this message is not the intended recipient and/or you have received this email in error, you must take no action based on the information in this email and you are hereby notified that any dissemination, misuse or copying or disclosure of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email and delete the original message. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Analyze on temp table taking very long
I have a function where in In a cursor loop I 1. create a temp table (on commit drop) 2. insert data into it 3. Run Analyze on the table Select/update outside the loop. This has been running fine for a while on multiple setups, large and small volumes. The setups all have the same hardware configuration. On one particular setup with about 200k records and this analyze runs for 45min and then times out(statement timeout is set to 45 min). typically this takes a few seconds at best. But when I move the analyze outside the loop everything runs fine. An section of the code for reference. CREATE TEMP TABLE tmp_hierarchy_sorted ( sort_id serial, aqu_document_id integer,parent_id integer, ancestor_id integer, object_hierarchy character varying(255), object_hierarchy_array text[], levels integer) ON COMMIT DROP TABLESPACE tblspc_tmp ; CREATE UNIQUE INDEX tmp_hierarchy_sorted_aqu_document_id_idx ON tmp_hierarchy_sorted USING btree( aqu_document_id ) TABLESPACE tblspc_index;'; execute vSQL; --get min doc number for that collection based on existing promoted collections in the matter select coalesce(max(doc_number_max),0) into iMin_Doc_number FROM doc_Collection c WHERE exists (SELECT 1 FROM doc_collection c1 WHERE c1.id = iCollectionId and c1.matter_id = c.matter_id and c1.doc_number_prefix = c.doc_number_prefix) AND status = 'PROMOTED'; --go ancestor by ancestor for ones that are not loose files open curAncestor for select distinct id FROM aqu_document_hierarchy h where collection_Id = iCollectionId and ancestor_id =-1 and parent_id = -1 AND EXISTS (select 1 from aqu_document_hierarchy h1 where h1.ancestor_id = h.id ) order by id ; LOOP FETCH curAncestor into iAncestor_id; EXIT WHEN NOT FOUND; --insert each ancestor into the table as this is not part in the bulk insert vSQL := 'INSERT INTO tmp_hierarchy_sorted( aqu_document_id, parent_id , ancestor_id , object_hierarchy, object_hierarchy_array,levels) (select id, -1, -1, object_hierarchy, regexp_split_to_array(object_hierarchy, ''/'') ,0 from aqu_document_hierarchy where collection_Id =' || iCollectionId || ' AND id = ' || iAncestor_id || ')'; execute vSQL; -- insert filtered documents for that ancestor vSQL := 'INSERT INTO tmp_hierarchy_sorted (aqu_document_id, parent_id , ancestor_id , object_hierarchy, object_hierarchy_array, levels) ( SELECT id, parent_id, ancestor_id, object_hierarchy, regexp_split_to_array(object_hierarchy, ''/'') as object_hierarchy_array, array_length(regexp_split_to_array(object_hierarchy, ''/'') ,1) as levels FROM aqu_document_hierarchy h WHERE EXISTS (SELECT 1 FROM aqu_document_error_details e where e.aqu_document_id = h.id and e.exit_status in (2,3,4,5) ) AND ancestor_id = ' || iAncestor_id || ' ORDER BY regexp_split_to_array(object_hierarchy, ''/'') );'; execute vSQL; ANALYZE tmp_hierarchy_sorted; END LOOP; Thanks for the help -mridula The information contained in this email message and its attachments is intended only for the private and confidential use of the recipient(s) named above, unless the sender expressly agrees otherwise. Transmission of email over the Internet is not a secure communications medium. If you are requesting or have requested the transmittal of personal data, as defined in applicable privacy laws by means of email or in an attachment to email, you must select a more secure alternate means of transmittal that supports your obligations to protect such personal data. If the reader of this message is not the intended recipient and/or you have received this email in error, you must take no action based on the information in this email and you are hereby notified that any dissemination, misuse or copying or disclosure of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email and delete the original message.
Re: [PERFORM] Analyze on temp table taking very long
Mahadevan, Mridula mridula.mahade...@ironmountain.com writes: This has been running fine for a while on multiple setups, large and small volumes. The setups all have the same hardware configuration. On one particular setup with about 200k records and this analyze runs for 45min and then times out(statement timeout is set to 45 min). typically this takes a few seconds at best. But when I move the analyze outside the loop everything runs fine. Is it actually *running*, as in doing something, or is it just blocked? I can't immediately think of any reason for some other process to have a lock on a temp table that belongs to your process; but it seems unlikely that ANALYZE would randomly take much longer than expected unless something was preventing it from making progress. Look into pg_locks and/or watch the backend with strace next time this happens. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] analyze
On Tue, Jan 29, 2008 at 04:28:45PM +0200, Adrian Moisey wrote: Seriously though, how do I try measure this? Is autovacuum not going to work for your case? A ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] analyze
Hi How long is a piece of string? While we're at it, how often do I vacuum analyze? Seriously though, how do I try measure this? -- Adrian Moisey System Administrator | CareerJunction | Your Future Starts Here. Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED] Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] analyze
On Jan 29, 2008 8:28 AM, Adrian Moisey [EMAIL PROTECTED] wrote: Hi How long is a piece of string? While we're at it, how often do I vacuum analyze? Seriously though, how do I try measure this? 1: Turn on autovacuum. 2: Look up the thread on nagios plugins for pgsql and rip the query for checking bloat out of it, or alternatively, use NAGIOS I guess. :) monitor your db for bloat, and if autovacuum isn't keeping up either set it to be more aggresive, or schedule manual vacuums for the tables that need it. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Analyze makes queries slow...
Hi all, I posted this problem on the sql list, and was referred to this list in stead. I have attached an sql statement that normally runs under 3 minutes. That is, until I vacuum analyze the database (or just the tables in the query), then the same query runs longer than 12 hours, and I have to kill it. However 90% of queries are faster after analyzing on this database, there are two or three, including this one that takes for ever. I have tried to reverse engineer the explain plan from before analyzing, to come up with an sql statement, using proper joins, to force the planner to do the original join, but although I came close, I never got the same result as the original query. I suspect that this might be caused by some of the crazy indexes that were built on some of these tables, but I can't really do much about that, unless I can come up with a very good reason to nuke them. I also attached the create table statements for all the tables, as well as a row count of each. Can somebody help me with guidelines or something similar, to understand exactly what is happening in the explain plan. TIA Stefan Aggregate (cost=52.00..61.64 rows=32 width=241) - Group (cost=52.00..57.62 rows=321 width=241) - Sort (cost=52.00..52.80 rows=321 width=241) Sort Key: m.group_code, m.sku, m.brn_code, m.stktype_code, ss.supplier_price_curr_cost, sk.sku_price_curr_cost - Merge Join (cost=36.38..38.62 rows=321 width=241) Merge Cond: ((outer.group_cde = inner.group_cde) AND (outer.brn_code = inner.brn_code)) Join Filter: ((outer.price_tmpl_hdr_cde = inner.price_tmpl_hdr_cde) AND (outer.price_tmpl_hdr_reg = inner.price_tmpl_hdr_reg)) - Nested Loop (cost=0.00..1407212.08 rows=63 width=179) Join Filter: ((inner.sku_mst_cde = outer.sku) AND (inner.group_cde = outer.group_code)) - Merge Join (cost=0.00..1405644.89 rows=315 width=135) Merge Cond: (outer.group_code = inner.group_code) Join Filter: (outer.sku = inner.sku) - Nested Loop (cost=0.00..4826563.70 rows=8694 width=108) - Index Scan using master_fpp_values_idx2 on master_fpp_values m (cost=0.00..3766902.34 rows=215650 width=54) Filter: (fpp_code = '200307'::text) - Index Scan using pk_supplier_price on supplier_price ss (cost=0.00..4.90 rows=1 width=54) Index Cond: ((ss.group_cde = outer.group_code) AND (ss.sku_mst_cde = outer.sku) AND (ss.supplier_cde = outer.supplier_code)) - Index Scan using master_sku_descr_idx3 on master_sku_descr s (cost=0.00..2535.04 rows=10758 width=27) Filter: (control_code = '0'::text) - Index Scan using idx_sku_price on sku_price sk (cost=0.00..4.96 rows=1 width=44) Index Cond: ((sk.group_cde = outer.group_cde) AND (sk.sku_mst_cde = outer.sku_mst_cde) AND (sk.price_tmpl_hdr_cde = outer.price_tmpl_hdr_cde) AND (sk.price_tmpl_hdr_reg = outer.price_tmpl_hdr_reg)) - Sort (cost=36.38..36.87 rows=198 width=62) Sort Key: p.group_cde, p.branch_cde - Hash Join (cost=18.46..28.82 rows=198 width=62) Hash Cond: (outer.brn_code = inner.branch_cde) - Merge Join (cost=13.94..20.34 rows=198 width=33) Merge Cond: (outer.country_code = inner.from_ctry) - Index Scan using master_branch_descr_idx4 on master_branch_descr b (cost=0.00..33.12 rows=198 width=15) - Sort (cost=13.94..13.95 rows=4 width=18) Sort Key: f.from_ctry - Index Scan using forex_idx1 on forex f (cost=0.00..13.90 rows=4 width=18) Index Cond: ((to_ctry = 'ZAF'::text) AND (fpp_code = '200307'::text)) - Hash (cost=4.02..4.02 rows=202 width=29) - Seq Scan on price_tmpl_det p (cost=0.00..4.02 rows=202 width=29) (34 rows) Aggregate (cost=163.58..163.61 rows=1 width=699) - Group (cost=163.58..163.60 rows=1 width=699) - Sort (cost=163.58..163.58 rows=1 width=699) Sort Key: m.group_code, m.sku, m.brn_code, m.stktype_code, ss.supplier_price_curr_cost, sk.sku_price_curr_cost - Nested Loop (cost=115.56..163.57 rows=1 width=699) Join Filter: ((outer.sku = inner.sku)
Re: [PERFORM] Analyze makes queries slow...
On Mon, Aug 11, 2003 at 03:58:41PM +0200, Stef wrote: I have attached an sql statement that normally runs under 3 minutes. That is, until I vacuum analyze the database (or just the tables in the query), then the same query runs longer than 12 hours, and I have to kill it. Hmm, I have noticed similar problem with a query with order by ... limit clause.Although it runs only 10 times slower after analyze :) The query joins one big table (20 000 rows) with several small tables (200-4000 rows) than order by primary key of big table limit 20 Without this order by ... limit clause the query is 5 times faster after analyze. Looking into explain analyze outputs: 1. Before vacuum analyze a planer chooses nested loop, the deepest is: - Nested Loop (cost=0.00..116866.54 rows=19286 width=96) (actual time=0.14..1.39 rows=21 loops=1) - Index Scan Backward using big_table_pkey on big_table k (cost=0.00..1461.15 rows=19286 width=52) (actual time=0.07..0.47 rows=21 loops=1) - Index Scan using 4000rows_table_pkey on 4000rows_table zs (cost=0.00..5.97 rows=1 width=44) (actual time=0.02..0.02 rows=0 loops=21) 2. After analyze uses hashjoins When I remove this order by limit clause the query after analyze takes the same time and the query before analyze is much more slower. I won't blame the planer. How he could learn that he should first take those 20 rows and than perform joins? There is a where clause with complex exists(subquery) condition regarding one of big_table fields, but removing this condition does not change the query plan. Pure joining without any additional conditions and only primary key of big table in select clause runs 4 times slower then whole query before vacuuum analyze :) Does in all the planer take in the consideration the limit clause? Probably I'm missing something. I don't know much about the planer. Finaly I have redesigned the query. Regards, Jacek ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Analyze makes queries slow...
Stef [EMAIL PROTECTED] writes: = Could we see the results of EXPLAIN ANALYZE, rather than just EXPLAIN, = for the un-analyzed case? Attached the output of this. Hmm... not immediately obvious where it's going wrong. Could you try this (after ANALYZE): set enable_mergejoin to off; explain analyze ... query ... If it finishes in a reasonable amount of time, send the explain output. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly