[PERFORM] Analyze and default_statistics_target

2013-01-21 Thread AJ Weber
I was under the impression that the default_statistics_target was a 
percentage of rows to analyze.  Maybe this is not the case?


I ran an analyze during a quiet point last night and for a few of my 
large tables, I didn't get what I consider a reasonable sampling of 
rows.  When running with verbose enabled, it appeared that a maximum 
of 24 rows were being analyzed, including on tables exceeding 4-8mm 
rows.  My default_statistics_target = 80.


Shouldn't I be analyzing a larger percentage of these big tables?

What is the unit-of-measure used for default_statistics_target?

Thanks in advance,
AJ



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analyze and default_statistics_target

2013-01-21 Thread Heikki Linnakangas

On 21.01.2013 17:29, AJ Weber wrote:

I was under the impression that the default_statistics_target was a
percentage of rows to analyze. Maybe this is not the case?


Nope.


I ran an analyze during a quiet point last night and for a few of my
large tables, I didn't get what I consider a reasonable sampling of
rows. When running with verbose enabled, it appeared that a maximum of
24 rows were being analyzed, including on tables exceeding 4-8mm
rows. My default_statistics_target = 80.

Shouldn't I be analyzing a larger percentage of these big tables?


Analyze only needs a fairly small random sample of the rows in the table 
to get a picture of what the data looks like. Compare with e.g opinion 
polls; you only need to sample a few thousand people to get a result 
with reasonable error bound.


That's for estimating the histogram. Estimating ndistinct is a different 
story, and it's well-known that the estimates of ndistinct are sometimes 
wildly wrong.



What is the unit-of-measure used for default_statistics_target?


It's the number of entries stored in the histogram and 
most-common-values list in pg_statistics.


See also http://www.postgresql.org/docs/devel/static/planner-stats.html:

The amount of information stored in pg_statistic by ANALYZE, in 
particular the maximum number of entries in the most_common_vals and 
histogram_bounds arrays for each column, can be set on a 
column-by-column basis using the ALTER TABLE SET STATISTICS command, or 
globally by setting the default_statistics_target configuration 
variable. The default limit is presently 100 entries.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analyze and default_statistics_target

2013-01-21 Thread Albe Laurenz
AJ Weber wrote:
 What is the unit-of-measure used for default_statistics_target?

Number of entries in pg_stats.histogram_bounds orpg_stats.most_common_vals.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analyze on temp table taking very long

2011-03-29 Thread Mahadevan, Mridula
Thanks for the tip. I'll also check in the lock, it's a customer setup and we 
don't get access to the box very frequently. 
Also
The code was something like this. 

loop 
inserting data into the tmptbl
analyze tmptbl
end loop

if I replace this with 

loop 
inserting data into the tmptbl
end loop
analyze


It goes through fine. 

-mridula



-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: Tuesday, March 22, 2011 3:57 PM
To: Mahadevan, Mridula
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Analyze on temp table taking very long

Mahadevan, Mridula mridula.mahade...@ironmountain.com writes:
 This has been running fine for a while on multiple setups, large and small 
 volumes. The setups all have the same hardware configuration.

 On one particular setup with about 200k records and this analyze runs for 
 45min and then times out(statement timeout is set to 45 min). typically this 
 takes a few seconds at best. But when I move the analyze outside the loop 
 everything runs fine.

Is it actually *running*, as in doing something, or is it just blocked?
I can't immediately think of any reason for some other process to have
a lock on a temp table that belongs to your process; but it seems
unlikely that ANALYZE would randomly take much longer than expected
unless something was preventing it from making progress.

Look into pg_locks and/or watch the backend with strace next time this
happens.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



The information contained in this email message and its attachments is intended 
only for the private and confidential use of the recipient(s) named above, 
unless the sender expressly agrees otherwise. Transmission of email over the 
Internet is not a secure communications medium. If you are requesting or have 
requested the transmittal of personal data, as defined in applicable privacy 
laws by means of email or in an attachment to email, you must select a more 
secure alternate means of transmittal that supports your obligations to protect 
such personal data. If the reader of this message is not the intended recipient 
and/or you have received this email in error, you must take no action based on 
the information in this email and you are hereby notified that any 
dissemination, misuse or copying or disclosure of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify us immediately by email and delete the original message. 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Analyze on temp table taking very long

2011-03-22 Thread Mahadevan, Mridula
I have a function where in
In a cursor loop I

1.   create a temp table (on commit drop)

2.   insert data into it

3.   Run Analyze on the table



Select/update outside the loop.

This has been running fine for a while on multiple setups, large and small 
volumes. The setups all have the same hardware configuration.

On one particular setup with about 200k records and this analyze runs for 45min 
and then times out(statement timeout is set to 45 min). typically this takes a 
few seconds at best. But when I move the analyze outside the loop everything 
runs fine.


An section of the code for reference.

CREATE TEMP TABLE tmp_hierarchy_sorted (  sort_id serial,  aqu_document_id 
integer,parent_id integer,  ancestor_id integer,  object_hierarchy character 
varying(255), object_hierarchy_array text[], levels integer) ON COMMIT DROP 
TABLESPACE tblspc_tmp   ;
  CREATE UNIQUE INDEX tmp_hierarchy_sorted_aqu_document_id_idx ON 
tmp_hierarchy_sorted USING btree( aqu_document_id ) TABLESPACE tblspc_index;';
execute vSQL;

--get min doc number for that collection based on existing promoted 
collections in the matter
select coalesce(max(doc_number_max),0) into iMin_Doc_number
FROM doc_Collection c
WHERE exists (SELECT 1 FROM doc_collection c1 WHERE c1.id = 
iCollectionId and c1.matter_id = c.matter_id and c1.doc_number_prefix = 
c.doc_number_prefix)
AND status = 'PROMOTED';

--go ancestor by ancestor for ones that are not loose files
open curAncestor for
select distinct id FROM aqu_document_hierarchy h where collection_Id = 
iCollectionId and ancestor_id =-1 and parent_id = -1
AND EXISTS (select 1 from aqu_document_hierarchy h1 where 
h1.ancestor_id = h.id ) order by id ;
LOOP
FETCH curAncestor into iAncestor_id;
EXIT WHEN NOT FOUND;
--insert each ancestor into the table as this is not part in the bulk 
insert
vSQL := 'INSERT INTO tmp_hierarchy_sorted(  aqu_document_id, parent_id 
,  ancestor_id ,  object_hierarchy, object_hierarchy_array,levels)
 (select id, -1, -1, object_hierarchy, 
regexp_split_to_array(object_hierarchy, ''/'') ,0
 from aqu_document_hierarchy where collection_Id =' || iCollectionId || 
' AND id = ' || iAncestor_id || ')';
execute vSQL;

-- insert filtered documents for that ancestor
vSQL := 'INSERT INTO tmp_hierarchy_sorted  (aqu_document_id, parent_id 
,  ancestor_id ,  object_hierarchy, object_hierarchy_array, levels)
 (
 SELECT id, parent_id, ancestor_id, object_hierarchy, 
regexp_split_to_array(object_hierarchy, ''/'')  as object_hierarchy_array, 
array_length(regexp_split_to_array(object_hierarchy, ''/'')  ,1) as levels
 FROM aqu_document_hierarchy h WHERE  EXISTS (SELECT 1 FROM 
aqu_document_error_details e where e.aqu_document_id = h.id and e.exit_status 
in (2,3,4,5) ) AND ancestor_id = ' || iAncestor_id ||
 ' ORDER BY regexp_split_to_array(object_hierarchy, ''/'')
);';
execute vSQL;
ANALYZE tmp_hierarchy_sorted;

END LOOP;



Thanks for the help
-mridula



The information contained in this email message and its attachments is intended 
only for the private and confidential use of the recipient(s) named above, 
unless the sender expressly agrees otherwise. Transmission of email over the 
Internet is not a secure communications medium. If you are requesting or have 
requested the transmittal of personal data, as defined in applicable privacy 
laws by means of email or in an attachment to email, you must select a more 
secure alternate means of transmittal that supports your obligations to protect 
such personal data. If the reader of this message is not the intended recipient 
and/or you have received this email in error, you must take no action based on 
the information in this email and you are hereby notified that any 
dissemination, misuse or copying or disclosure of this communication is 
strictly prohibited. If you have received this communication in error, please 
notify us immediately by email and delete the original message. 

Re: [PERFORM] Analyze on temp table taking very long

2011-03-22 Thread Tom Lane
Mahadevan, Mridula mridula.mahade...@ironmountain.com writes:
 This has been running fine for a while on multiple setups, large and small 
 volumes. The setups all have the same hardware configuration.

 On one particular setup with about 200k records and this analyze runs for 
 45min and then times out(statement timeout is set to 45 min). typically this 
 takes a few seconds at best. But when I move the analyze outside the loop 
 everything runs fine.

Is it actually *running*, as in doing something, or is it just blocked?
I can't immediately think of any reason for some other process to have
a lock on a temp table that belongs to your process; but it seems
unlikely that ANALYZE would randomly take much longer than expected
unless something was preventing it from making progress.

Look into pg_locks and/or watch the backend with strace next time this
happens.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] analyze

2008-01-29 Thread Andrew Sullivan
On Tue, Jan 29, 2008 at 04:28:45PM +0200, Adrian Moisey wrote:
 
 Seriously though, how do I try measure this?

Is autovacuum not going to work for your case? 

A


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] analyze

2008-01-29 Thread Adrian Moisey

Hi

How long is a piece of string?

While we're at it, how often do I vacuum analyze?

Seriously though, how do I try measure this?

--
Adrian Moisey
System Administrator | CareerJunction | Your Future Starts Here.
Web: www.careerjunction.co.za | Email: [EMAIL PROTECTED]
Phone: +27 21 686 6820 | Mobile: +27 82 858 7830 | Fax: +27 21 686 6842

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] analyze

2008-01-29 Thread Scott Marlowe
On Jan 29, 2008 8:28 AM, Adrian Moisey [EMAIL PROTECTED] wrote:
 Hi

 How long is a piece of string?

 While we're at it, how often do I vacuum analyze?

 Seriously though, how do I try measure this?

1: Turn on autovacuum.
2: Look up the thread on nagios plugins for pgsql and rip the query
for checking bloat out of it, or alternatively, use NAGIOS I guess. :)

monitor your db for bloat, and if autovacuum isn't keeping up either
set it to be more aggresive, or schedule manual vacuums for the tables
that need it.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] Analyze makes queries slow...

2003-08-14 Thread Stef
Hi all,

I posted this problem on the sql list, and was referred to this list in stead.
I have attached an sql statement that normally runs under 3 minutes.
That is, until I vacuum analyze the database (or just the tables in the query),
then the same query runs longer than 12 hours, and I have to kill it.

However 90% of queries are faster after analyzing on this database,
there are two or three, including this one that takes for ever.

I have tried to reverse engineer the explain plan from before analyzing,
to come up with an sql statement, using proper joins, to force the planner
to do the original join, but although I came close, I never got the same 
result as the original query.

I suspect that this might be caused by some of the crazy indexes that 
were built on some of these tables, but I can't really do much about that,
unless I can come up with a very good reason to nuke them.

I also attached the create table statements for all the tables, as well
as a row count of each.

Can somebody help me with guidelines or something similar, 
to understand exactly what is happening in the explain plan.

TIA
Stefan

 Aggregate  (cost=52.00..61.64 rows=32 width=241)
   -  Group  (cost=52.00..57.62 rows=321 width=241)
 -  Sort  (cost=52.00..52.80 rows=321 width=241)
   Sort Key: m.group_code, m.sku, m.brn_code, m.stktype_code, 
ss.supplier_price_curr_cost, sk.sku_price_curr_cost
   -  Merge Join  (cost=36.38..38.62 rows=321 width=241)
 Merge Cond: ((outer.group_cde = inner.group_cde) AND 
(outer.brn_code = inner.brn_code))
 Join Filter: ((outer.price_tmpl_hdr_cde = 
inner.price_tmpl_hdr_cde) AND (outer.price_tmpl_hdr_reg = 
inner.price_tmpl_hdr_reg))
 -  Nested Loop  (cost=0.00..1407212.08 rows=63 width=179)
   Join Filter: ((inner.sku_mst_cde = outer.sku) AND 
(inner.group_cde = outer.group_code))
   -  Merge Join  (cost=0.00..1405644.89 rows=315 width=135)
 Merge Cond: (outer.group_code = inner.group_code)
 Join Filter: (outer.sku = inner.sku)
 -  Nested Loop  (cost=0.00..4826563.70 rows=8694 
width=108)
   -  Index Scan using master_fpp_values_idx2 on 
master_fpp_values m  (cost=0.00..3766902.34 rows=215650 width=54)
 Filter: (fpp_code = '200307'::text)
   -  Index Scan using pk_supplier_price on 
supplier_price ss  (cost=0.00..4.90 rows=1 width=54)
 Index Cond: ((ss.group_cde = 
outer.group_code) AND (ss.sku_mst_cde = outer.sku) AND (ss.supplier_cde = 
outer.supplier_code))
 -  Index Scan using master_sku_descr_idx3 on 
master_sku_descr s  (cost=0.00..2535.04 rows=10758 width=27)
   Filter: (control_code = '0'::text)
   -  Index Scan using idx_sku_price on sku_price sk  
(cost=0.00..4.96 rows=1 width=44)
 Index Cond: ((sk.group_cde = outer.group_cde) AND 
(sk.sku_mst_cde = outer.sku_mst_cde) AND (sk.price_tmpl_hdr_cde = 
outer.price_tmpl_hdr_cde) AND (sk.price_tmpl_hdr_reg = outer.price_tmpl_hdr_reg))
 -  Sort  (cost=36.38..36.87 rows=198 width=62)
   Sort Key: p.group_cde, p.branch_cde
   -  Hash Join  (cost=18.46..28.82 rows=198 width=62)
 Hash Cond: (outer.brn_code = inner.branch_cde)
 -  Merge Join  (cost=13.94..20.34 rows=198 width=33)
   Merge Cond: (outer.country_code = 
inner.from_ctry)
   -  Index Scan using master_branch_descr_idx4 
on master_branch_descr b  (cost=0.00..33.12 rows=198 width=15)
   -  Sort  (cost=13.94..13.95 rows=4 width=18)
 Sort Key: f.from_ctry
 -  Index Scan using forex_idx1 on forex 
f  (cost=0.00..13.90 rows=4 width=18)
   Index Cond: ((to_ctry = 
'ZAF'::text) AND (fpp_code = '200307'::text))
 -  Hash  (cost=4.02..4.02 rows=202 width=29)
   -  Seq Scan on price_tmpl_det p  
(cost=0.00..4.02 rows=202 width=29)
(34 rows)

 Aggregate  (cost=163.58..163.61 rows=1 width=699)
   -  Group  (cost=163.58..163.60 rows=1 width=699)
 -  Sort  (cost=163.58..163.58 rows=1 width=699)
   Sort Key: m.group_code, m.sku, m.brn_code, m.stktype_code, 
ss.supplier_price_curr_cost, sk.sku_price_curr_cost
   -  Nested Loop  (cost=115.56..163.57 rows=1 width=699)
 Join Filter: ((outer.sku = inner.sku) 

Re: [PERFORM] Analyze makes queries slow...

2003-08-14 Thread Jacek Rembisz
On Mon, Aug 11, 2003 at 03:58:41PM +0200, Stef wrote:

 I have attached an sql statement that normally runs under 3 minutes.
 That is, until I vacuum analyze the database (or just the tables in the query),
 then the same query runs longer than 12 hours, and I have to kill it.

Hmm, I have noticed similar problem with a query with order by ... limit 
clause.Although it runs only 10 times slower after analyze :)

The query joins one big table (20 000 rows) with several small tables
(200-4000 rows) than order by primary key of big table limit 20

Without this order by ... limit clause the query is 5 times faster after
analyze.

Looking into explain analyze outputs:
1. Before vacuum analyze  a planer chooses nested loop, the  deepest is:
  -  Nested Loop  (cost=0.00..116866.54 rows=19286 width=96) (actual time=0.14..1.39 
rows=21 loops=1)
  -  Index Scan Backward using big_table_pkey on big_table k  (cost=0.00..1461.15 
rows=19286 width=52) (actual time=0.07..0.47 rows=21 loops=1)
  -  Index Scan using 4000rows_table_pkey on 4000rows_table zs  (cost=0.00..5.97 
rows=1 width=44) (actual time=0.02..0.02 rows=0 loops=21)

2. After analyze uses hashjoins

When I remove this order by limit clause the query after analyze takes 
the same time and the query before analyze is much more slower.

I won't blame the planer. How he could learn that he should first 
take those 20 rows and than perform joins? There is a where clause
with complex exists(subquery) condition regarding one of big_table fields,
but removing this condition does not change the query plan.

Pure joining without any additional conditions and only primary key of big 
table in select clause runs 4 times slower then whole query before 
vacuuum analyze :)

Does in all the planer take in the consideration the limit clause?

Probably I'm missing something. I don't know much about the planer.

Finaly I have redesigned the query.

Regards,
Jacek


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Analyze makes queries slow...

2003-08-14 Thread Tom Lane
Stef [EMAIL PROTECTED] writes:
 = Could we see the results of EXPLAIN ANALYZE, rather than just EXPLAIN,
 = for the un-analyzed case? 

 Attached the output of this.

Hmm... not immediately obvious where it's going wrong.  Could you try
this (after ANALYZE):

set enable_mergejoin to off;
explain analyze   ... query ...

If it finishes in a reasonable amount of time, send the explain output.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly