Re: [PERFORM] Slow Delete : Seq scan instead of index scan

2012-10-17 Thread Craig Ringer

On 10/16/2012 04:41 PM, Filippos Kalamidas wrote:

the first thing you should probably do is run an 'analyze' on one of
these tables and then run again the delete statement. if there are no
stats for these tables, it's normal not to have very good plans.


Yep, and the fact that the stats are that bad suggests that autovaccum 
probably isn't running, or isn't running often enough.


If you have a high INSERT/UPDATE/DELETE load, then turn autovacuum up on 
that table. See:


  http://www.postgresql.org/docs/current/static/routine-vacuuming.html


http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html


If the table is badly bloated it might be worth running VACUUM FULL on 
it or (if you're on PostgreSQL 8.4 or below) instead CLUSTER the table 
on an index, as VACUUM FULL is very inefficient in 8.4 and older (I 
think; I might be misremembering the versions).



Please specify your PostgreSQL version in all questions. See 
https://wiki.postgresql.org/wiki/Slow_Query_Questions


--
Craig Ringer



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] pgbounce max_client_conn and default_pool_size

2012-10-17 Thread Yetkin Öztürk
What is the adequate *pgbounce* *max_client_conn ,default_pool_size* values
for a postgres config which has *max_connections = 400*.

We want to move to pgbouncer to let postgres do the only db job but it
confused us.  We have over 120 databases in a single postgres engine with
as i said  max_connections = 400 .

120+ databases are driven from 3 separate web servers, we are planning to
install pgbounce to three of them.

PS: in source code of pgbouncer it looks like opens separate fd's
for max_client_conn * default_pool_size value.

Regards,

Yetkin Öztürk


[PERFORM] have: seq scan - want: index scan

2012-10-17 Thread Chris Ruprecht
Hi guys,

PG = 9.1.5
OS = winDOS 2008R8

I have a table that currently has 207 million rows.
there is a timestamp field that contains data.
more data gets copied from another database into this database.
How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.
I tested different indexes - no same behavior.

The query does this:

SELECT   
audittrailclinical.pgid,   
audittrailclinical.timestamp,   
mmuser.logon,  
audittrailclinical.entityname,   
audittrailclinical.clinicalactivity,   
audittrailclinical.audittraileventcode,   
account.accountnumber,   
patient.dnsortpersonnumber  
FROM   
public.account,   
public.audittrailclinical,   
public.encounter,   
public.entity,   
public.mmuser,   
public.patient,   
public.patientaccount 
WHERE   
  audittrailclinical.encountersid = encounter.encountersid 
and   audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone 
and   audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone
AND  encounter.practiceid = patient.practiceid 
AND  encounter.patientid = patient.patientid 
AND  encounter.staffid = patient.staffid 
AND  entity.entitysid = audittrailclinical.entitysid 
AND  mmuser.mmusersid = audittrailclinical.mmusersid 
AND  patient.practiceid = patientaccount.practiceid 
AND  patient.patientid = patientaccount.patientid 
AND  patientaccount.accountsid = account.accountsid 
AND  patientaccount.defaultaccount = 'Y' 
AND  patient.dnsortpersonnumber = '347450' ;

The query plan says:

  -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 
rows=203856829 width=62)
Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp 
without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without 
time zone))

which takes forever.

How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.

the table definitions are (the createstamp field is empty - I know, bad data):

CREATE TABLE audittrailclinical
(
  audittrailid text,
  audittraileventcode text,
  clinicalactivity text,
  eventsuccessful text,
  externalunique text,
  recordstamp timestamp without time zone,
  recorddescription text,
  encountersid integer,
  eventuserlogon text,
  computername text,
  applicationcode text,
  practiceid integer,
  mmusersid integer,
  entitysid integer,
  entityname text,
  timestamp timestamp without time zone,
  lastuser integer,
  createstamp timestamp without time zone,
  pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE audittrailclinical
  OWNER TO intergy;
GRANT ALL ON TABLE audittrailclinical TO intergy;
GRANT SELECT ON TABLE audittrailclinical TO rb;

-- Index: atc_en_time

CREATE INDEX atc_en_time
  ON audittrailclinical
  USING btree
  (entitysid , timestamp );

-- Index: atc_id

-- DROP INDEX atc_id;

CREATE INDEX atc_id
  ON audittrailclinical
  USING btree
  (audittrailid COLLATE pg_catalog.default );






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Out of shared mem on new box with more mem, 9.1.5 - 9.1.6

2012-10-17 Thread Maciek Sakrejda
We've run into a perplexing issue with a customer database. He moved
from a 9.1.5 to a 9.1.6 and upgraded from an EC2 m1.medium (3.75GB
RAM, 1.3 GB shmmax), to an m2.xlarge (17GB RAM, 5.7 GB shmmax), and is
now regularly getting constant errors regarding running out of shared
memory (there were none on the old system in the recent couple of
days' logs from before the upgrade):

ERROR:  out of shared memory
HINT:  You might need to increase max_pred_locks_per_transaction.

The query causing this has structurally identical plans on both systems:

old: http://explain.depesz.com/s/Epzq
new: http://explain.depesz.com/s/WZo

The settings ( select name, setting from pg_settings where source 
'default' and name not like 'log%' and name not like 'ssl%' and name
not like 'syslog%' ) are almost identical
(max_pred_locks_per_transaction itself is at the default):

17c17
  effective_cache_size | 153
---
  effective_cache_size | 337500
38c38
  shared_buffers   | 424960
---
  shared_buffers   | 93696

The kernels are both 2.6.32. The workload has not changed
significantly. Could something in 9.1.6 be to blame here? Looking at
the changelog, this seems vanishingly unlikely. Any ideas?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Out of shared mem on new box with more mem, 9.1.5 - 9.1.6

2012-10-17 Thread Martin French
 
 We've run into a perplexing issue with a customer database. He moved
 from a 9.1.5 to a 9.1.6 and upgraded from an EC2 m1.medium (3.75GB
 RAM, 1.3 GB shmmax), to an m2.xlarge (17GB RAM, 5.7 GB shmmax), and is
 now regularly getting constant errors regarding running out of shared
 memory (there were none on the old system in the recent couple of
 days' logs from before the upgrade):
 
 ERROR:  out of shared memory
 HINT:  You might need to increase max_pred_locks_per_transaction.
 
 The query causing this has structurally identical plans on both systems:
 
 old: http://explain.depesz.com/s/Epzq
 new: http://explain.depesz.com/s/WZo
 
 The settings ( select name, setting from pg_settings where source 
 'default' and name not like 'log%' and name not like 'ssl%' and name
 not like 'syslog%' ) are almost identical
 (max_pred_locks_per_transaction itself is at the default):
 
 17c17
   effective_cache_size | 153
 ---
   effective_cache_size | 337500
 38c38
   shared_buffers   | 424960
 ---
   shared_buffers   | 93696
 
 The kernels are both 2.6.32. The workload has not changed
 significantly. Could something in 9.1.6 be to blame here? Looking at
 the changelog, this seems vanishingly unlikely. Any ideas?
 


What are the settings for:

work_mem
maintenance_work_mem

How many concurrent connections are there?

Have you ran explain analyze on the query that doesn't crash (i.e the old 
box) to get the exact execution plan?

Has the DB been vacuum analyzed?

Cheers

=

Romax Technology Limited
Rutherford House
Nottingham Science  Technology Park
Nottingham, 
NG7 2PZ
England

Telephone numbers:
+44 (0)115 951 88 00 (main)

For other office locations see:
http://www.romaxtech.com/Contact
=
===
E-mail: i...@romaxtech.com
Website: www.romaxtech.com
=


Confidentiality Statement
This transmission is for the addressee only and contains information that 
is confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf 
of the addressee 
you may not copy or use it, or disclose it to anyone else. 
If you have received this transmission in error please delete from your 
system and contact the sender. Thank you for your cooperation.
=

Re: [PERFORM] have: seq scan - want: index scan

2012-10-17 Thread Samuel Gendler
On Tue, Oct 16, 2012 at 4:45 PM, Chris Ruprecht ch...@cdrbill.com wrote:

 Hi guys,

 PG = 9.1.5
 OS = winDOS 2008R8

 I have a table that currently has 207 million rows.
 there is a timestamp field that contains data.
 more data gets copied from another database into this database.
 How do I make this do an index scan instead?
 I did an analyze audittrailclinical to no avail.
 I tested different indexes - no same behavior.


 The query plan says:

   -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76
 rows=203856829 width=62)
 Filter: ((timestamp = '2008-01-01
 00:00:00'::timestamp without time zone) AND (timestamp = '2012-10-05
 00:00:00'::timestamp without time zone))

 which takes forever.

 How do I make this do an index scan instead?
 I did an analyze audittrailclinical to no avail.


analyze says 203 million out of 207 million rows are matched by your
timestamp filter, so it is definitely going to favour a sequential scan,
since an index scan that matches that many rows will inevitably be slower
than simply scanning the table, since it will have to both do the lookups
and load the actual records from the table (all of them, basically) in
order to determine their visibility to you, so your index scan will just
turn sequential access of the table pages into random access and require
index lookups as well.   You can possibly verify this by setting
enable_seqscan to false and running your analyze again and see how the plan
changes, though I don't believe that will necessarily remove all sequential
scans, it just reduces their likelihood, so you may see that nothing
changes. If the estimate for the number of matching rows is incorrect,
you'll want to increase the statistics gathering for that table or just
that column.

ALTER TABLE table ALTER COLUMN column SET STATISTICS number

where number is between 10 and 1000 and I think the default is 100.  Then
re-analyze the table and see if the query plan shows better estimates.  I
think 9.2 also supports index only scans which eliminate the need to load
the matched records in certain circumstances. However, all of the columns
used by the query would need to be in the index, and you are using an awful
lot of columns between the select clause and the table joins.

Are you lacking indexes on the columns used for joins that would allow more
selective index scans on those columns which could then just filter by
timestamp?  I'm not much of an expert on the query planner, so I'm not sure
what exactly will cause that behaviour, but I'd think that good statistics
and useful indexes should allow the rest of the where clause to be more
selective of the rows from audittrailclinical unless
patientaccount.defaultaccount
= 'Y' and patient.dnsortpersonnumber = '347450'  are similarly
non-selective, though patient.dnsortpersonnumber would seem like it is
probably the strong filter, so make sure you've got indexes and accurate
stats on all of the foreign keys that connect patient table and
audittrailclinical table.  It'd be useful to see the rest of the explain
analyze output so we could see how it is handling the joins and why.  Note
that because you have multiple composite foreign keys joining tables in
your query, you almost certainly won't those composite keys in a single
index.  If you have indexes on those columns but they are single-column
indexes, that may be what is causing the planner to try to filter the atc
table on the timestamp rather than via the joins.  I'm sure someone more
knowledgable than I will be along eventually to correct any misinformation
I may have passed along.  Without knowing anything about your schema or the
rest of the explain analyze output, I'm mostly just guessing.  There is an
entire page devoted to formulating useful mailing list questions,
incidentally.  Yours really isn't.  Or if the atc table definition is
complete, you are definitely missing potentially useful indexes, since you
are joining to that table via encountersid and you don't show an index on
that column - yet that is the column that eventually joins out to the
patient and patientaccount tables, which have the stronger filters on them.

Incidentally, why the join to the entity table via entitysid?  No columns
from that table appear to be used anywhere else in the query.

--sam


Re: [PERFORM] Out of shared mem on new box with more mem, 9.1.5 - 9.1.6

2012-10-17 Thread Maciek Sakrejda
On Wed, Oct 17, 2012 at 1:53 AM, Martin French
martin.fre...@romaxtech.com wrote:

Thanks for your response.

 What are the settings for:
 work_mem
 100MB

 maintenance_work_mem
 64MB

 How many concurrent connections are there?
~20

 Have you ran explain analyze on the query that doesn't crash (i.e the old
 box) to get the exact execution plan?

I can try that in the morning, but I didn't think this was relevant. I
know cost estimates can be off, but can the plan actually change
between a vanilla explain and an explain analyze?

 Has the DB been vacuum analyzed?

Not outside of autovacuum, no, but it's actually a former replica of
the first database (sorry I neglected to mention this earlier).


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Out of shared mem on new box with more mem, 9.1.5 - 9.1.6

2012-10-17 Thread Martin French
 On Wed, Oct 17, 2012 at 1:53 AM, Martin French
 martin.fre...@romaxtech.com wrote:
 
 Thanks for your response.
 
  What are the settings for:
  work_mem
  100MB
This is a little higher than I would ordinarily set. I tend to cap at 
about 64MB

 
  maintenance_work_mem
  64MB
In Contrast, this is a little low for me, but I guess that table size is a 
big factor here.

 
  How many concurrent connections are there?
 ~20
 
  Have you ran explain analyze on the query that doesn't crash (i.e the 
old
  box) to get the exact execution plan?
 
 I can try that in the morning, but I didn't think this was relevant. I
 know cost estimates can be off, but can the plan actually change
 between a vanilla explain and an explain analyze?
 
The explain analyze gives a more detailed output.

 
  Has the DB been vacuum analyzed?
 
 Not outside of autovacuum, no, but it's actually a former replica of
 the first database (sorry I neglected to mention this earlier).
 

This may be worthwhile. Even with autovacuum on, I still Vac Analyze 
manually during quiet periods. whether it's actually necessary or not, 
figure it's belt and braces.

Looking at the explain, It'd suggest the tables aren't very large, so I 
can't see really why there'd be a problem. Notwithstanding the fact that 
you are only relatively small shared_buffers.

Are there no other messages in the log files re: out of memory. There 
should be a dump which will show you where the memory usage is occurring. 

Other than that, you may want to consider increasing the shared buffers 
and see if that has any effect. Alternately, you may want to increase 
max_pred_locks_per_transaction beyond the default of 64, although this is 
not a parameter I've had to yet adjust.

Cheers

=

Romax Technology Limited
Rutherford House
Nottingham Science  Technology Park
Nottingham, 
NG7 2PZ
England

Telephone numbers:
+44 (0)115 951 88 00 (main)

For other office locations see:
http://www.romaxtech.com/Contact
=
===
E-mail: i...@romaxtech.com
Website: www.romaxtech.com
=


Confidentiality Statement
This transmission is for the addressee only and contains information that 
is confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf 
of the addressee 
you may not copy or use it, or disclose it to anyone else. 
If you have received this transmission in error please delete from your 
system and contact the sender. Thank you for your cooperation.
=

Re: [PERFORM] SELECT AND AGG huge tables

2012-10-17 Thread houmanb
Hi all,
Thanks for your advice and the link about posting my question in an
appropriate form.
Here are the info. I thank all of you in advance.

Best regards
Houman



Postgres version: 9.1.4
=
Postgres.conf
max_connections = 100
shared_buffers = 8192MB
work_mem = 500MB
log_statement = 'none'
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
max_locks_per_transaction = 256

=
Hardware: 
CPU Quad Core Intel CPU
processor   : 0-7
vendor_id   : GenuineIntel
cpu family  : 6
model   : 45
model name  : Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz

Memory:
MemTotal:   32927920 kB

HDD:
OCZ VeloDrive - Solid-State-Disk - 600 GB - intern - PCI Express 2.0 x8
Multi-Level-Cell (MLC)
PCI Express 2.0 x8
IO/stat===
iostat sdb1 1
Linux 3.2.0-23-generic (regula2)10/17/2012  _x86_64_(8 CPU)
Device:tpskB_read/skB_wrtn/skB_readkB_wrtn
sdb1  6.44   217.91   240.45 1956400373 2158777589
sdb1  0.00 0.00 0.00  0  0
sdb1  0.00 0.00 0.00  0  0
sdb1  0.00 0.00 0.00  0  0
sdb1  0.00 0.00 0.00  0  0
sdb1  0.00 0.00 0.00  0  0
=vmstat==
procs ---memory-- ---swap-- -io -system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
wa
 1  0  44376 2417096 210784 2866402400303500  0  0
100  0
 0  0  44376 2416964 210784 2866402400 0 0   80  138  0  0
100  0
 1  0  44376 2416592 210784 2866402400 0 0  278  228  7  0
93  0
 1  0  44376 2416592 210784 2866428000 0 0  457  305 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  472  303 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  462  296 13  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  478  293 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  470  317 12  0
87  0
 1  0  44376 2416716 210784 2866428000 0 0  455  299 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  459  301 12  0
87  0
 1  0  44376 2416716 210784 2866428000 0 0  370  291  7  5
88  0
 1  0  44376 2416716 210784 2866428000 029  459  319 12  1
88  0
 1  0  44376 2416716 210784 2866428000 0 0  453  295 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  449  284 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 8  462  304 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  459  307 12  0
88  0
 2  0  44376 2416716 210784 2866428000 0 0  461  300 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  457  299 12  0
87  0
 1  0  44376 2416716 210784 2866428000 0 0  439  295 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  439  306 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  448  305 12  0
88  0
 1  0  44376 2416716 210784 2866428000 0 0  457  289 12  0
88  0
 0  0  44376 2416716 210784 2866428000 0 0  174  179  3  0
97  0
 0  0  44376 2416716 210784 2866428000 0 0   73  133  0  0
100  0
 0  0  44376 2416716 210784 2866428000 0 0   75  133  0  0
100  0
 0  0  44376 2416716 210784 2866428000 0 0   70  127  0  0
100  0



Column |Type |  
Modifiers   
---+-+---
 modifying_action  | integer | 
 modifying_client  | integer | 
 modification_time | timestamp without time zone | 
 instance_entity   | integer | 
 id| integer | not null default
nextval('enigma.fact_seq'::regclass)
 successor | integer | 
 reporting_date| integer | 
 legal_entity  | integer | 
 client_system | integer | 
 customer  | integer | 
 customer_type | integer | 
 borrower  | integer | 
 nace  | integer | 
 lsk   | integer | 
 review_date   

[PERFORM] High cost estimates when n_distinct is set

2012-10-17 Thread Niko Kiirala
Hello

I am working on a potentially large database table, let's call it 
observation, that has a foreign key to table measurement. Each measurement 
is associated with either none or around five observations. In this kind of 
situation, it is well known that the statistics on the foreign key column in 
observation table can get arbitrarily bad as the row count increases. 
Especially, the estimate of the number of distinct values in the foreign key 
column can be completely off.

To combat this issue I have set n_distinct=-0.2 on the foreign key column. With 
this, the query planner gets good estimates of row counts, but it would appear 
that this setting does not affect the cost estimate of an index scan. With 
this, I get odd cost estimates, as if fetching these approximately five rows 
using  index scan would take hundreds of random disk reads. Due to this high 
cost estimate, when joining these two tables, PostgreSQL changes from using 
multiple small index scans to scanning the whole table a lot earlier than would 
be beneficial.

So, in more detail:

I am using PostgreSQL 9.2.1 on Windows 7 SP 1 installed with EnterpriseDB 
one-click installer and with default settings.
SELECT version();
PostgreSQL 9.2.1, compiled by Visual C++ build 1600, 64-bit

I have set up a simple testing database as follows (running these commands 
takes around an hour on my DB. It's slow, sorry, but I need lots of rows to 
show the issue):

CREATE TABLE observation
(
  id bigserial NOT NULL,
  measurement_id bigint NOT NULL,
  CONSTRAINT observation_pkey PRIMARY KEY (id)
);

CREATE INDEX observation_measurement_id_idx
  ON observation
  USING btree
  (measurement_id);

INSERT INTO observation
  SELECT x as id,
CASE WHEN (x - 1) % 15  3 THEN ((x - 1) / 15) * 4
 WHEN (x - 1) % 15  8 THEN ((x - 1) / 15) * 4 + 1
 ELSE ((x - 1) / 15) * 4 + 2
END AS measurement_id
  FROM (SELECT generate_series(1, 1) AS x) AS series;
  
ANALYZE observation;

Here the measurement_id stands for the foreign key to table measurement. 
Actually having that table is not required to show the issue, though. Each 
number from range 1...2666 (1e8 * 4 / 15) appears 0, 3, 5 or 7 times in 
measurement_id column.

After this, the statistics on measurement_id column look something like this:
Distinct Values  1.23203e+006
Most Common Values {6895590, 8496970, 23294094, 75266, 128877, 150786, 175001, 
192645, 216918, 262742, ...
Most Common Frequencies {0.0001, 0.0001, 0.0001, 6.7e-005, 6.7e-005, 
6.7e-005, 6.7e-005, 6.7e-005, 6.7e-005, 6.7e-005, ...
Histogram Bounds {14, 208364, 511400, 840725, 1091642, 1392585, 1713998, 
1945482, 2204897, 2476654, ...
Correlation 1

As there are 20e6 distinct values, the 1.2e6 estimate is already somewhat off 
and will keep on going worse if more rows are added.

Let's have a look on the query plan of fetching all observations of a single 
measurement:
EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS on, TIMING on )
SELECT id, measurement_id
  FROM observation
  WHERE measurement_id = 21;

Index Scan using observation_measurement_id_idx on public.observation  
(cost=0.00..119.36 rows=82 width=16) (actual time=0.060..0.062 rows=5 loops=1)
  Output: id, measurement_id
  Index Cond: (observation.measurement_id = 21)
  Buffers: shared read=5
Total runtime: 0.081 ms

The row estimate is off by a factor of 10, so let's help the planner and tell 
how many rows it is likely to find:
ALTER TABLE observation
   ALTER COLUMN measurement_id
   SET (n_distinct=-0.2);
ANALYZE observation;

This doesn't change the row statistics noticeably, except for changing the 
number of distinct values.

EXPLAIN (ANALYZE on, VERBOSE on, COSTS on, BUFFERS on, TIMING on )
SELECT id, measurement_id
  FROM observation
  WHERE measurement_id = 21;

Index Scan using observation_measurement_id_idx on public.observation  
(cost=0.00..118.01 rows=5 width=16) (actual time=0.060..0.061 rows=5 loops=1)
  Output: id, measurement_id
  Index Cond: (observation.measurement_id = 21)
  Buffers: shared read=5
Total runtime: 0.073 ms

So, the row count estimate is now good, but the cost estimate has not changed 
much. If I halve the random_page_cost (from 4 to 2), it nearly exactly halves 
the estimated cost, so it would appear this cost is almost completely from 
random page accesses, approximately 29 of them. In the original plan this made 
sense, as it expected to fetch 81 rows, but for five rows it would seem quite 
excessive. 

Enlarging the table from 100 million to 200 million rows almost doubles the 
estimated cost (from 118.01 to 227.69). Since the estimated and actual count of 
returned rows stay the same and the query is using a B tree, I'd expect the 
cost to rise only slightly.

I have a distinct feeling that this is either a bug in the cost estimator or 
there's a quite valid reason which I have missed. Regardless, any insights you 
might have to this issue would be appreciated.


Re: [PERFORM] Out of shared mem on new box with more mem, 9.1.5 - 9.1.6

2012-10-17 Thread Tom Lane
Maciek Sakrejda m.sakre...@gmail.com writes:
 We've run into a perplexing issue with a customer database. He moved
 from a 9.1.5 to a 9.1.6 and upgraded from an EC2 m1.medium (3.75GB
 RAM, 1.3 GB shmmax), to an m2.xlarge (17GB RAM, 5.7 GB shmmax), and is
 now regularly getting constant errors regarding running out of shared
 memory (there were none on the old system in the recent couple of
 days' logs from before the upgrade):

 ERROR:  out of shared memory
 HINT:  You might need to increase max_pred_locks_per_transaction.

This has nothing to do with work_mem nor maintenance_work_mem; rather,
it means you're running out of space in the database-wide lock table.
You need to take the hint's advice.

 The query causing this has structurally identical plans on both systems:

 old: http://explain.depesz.com/s/Epzq
 new: http://explain.depesz.com/s/WZo

The query in itself doesn't seem very exceptional.  I wonder whether
you recently switched your application to use serializable mode?  But
anyway, a query's demand for predicate locks can depend on a lot of
not-very-visible factors, such as how many physical pages the tuples
it accesses are spread across.  I don't find it too hard to credit
that yesterday you were just under the limit and today you're just
over even though nothing changed.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Out of shared mem on new box with more mem, 9.1.5 - 9.1.6

2012-10-17 Thread Maciek Sakrejda
On Wed, Oct 17, 2012 at 7:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 ERROR:  out of shared memory
 HINT:  You might need to increase max_pred_locks_per_transaction.

 This has nothing to do with work_mem nor maintenance_work_mem; rather,
 it means you're running out of space in the database-wide lock table.
 You need to take the hint's advice.

Sure, just trying to understand why this happened in the first place.

 The query in itself doesn't seem very exceptional.  I wonder whether
 you recently switched your application to use serializable mode?

The change (for some transactions) was relatively recent, but predated
the switch to the replica by several days. Before the switch,
everything was running fine.

 But
 anyway, a query's demand for predicate locks can depend on a lot of
 not-very-visible factors, such as how many physical pages the tuples
 it accesses are spread across.  I don't find it too hard to credit
 that yesterday you were just under the limit and today you're just
 over even though nothing changed.

Interesting, thanks for the input. So it could be just a coincidence
that the errors occurred in lock-step with the promotion? Or does a
replica have a different (or different enough) physical layout that
this could have been a factor (my understanding of replication is
relatively high level--read: vague)?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Two identical systems, radically different performance

2012-10-17 Thread Andrea Suisani

On 10/15/2012 05:34 PM, Scott Marlowe wrote:

On Mon, Oct 15, 2012 at 9:28 AM, Claudio Freire klaussfre...@gmail.com wrote:

On Mon, Oct 15, 2012 at 12:24 PM, Andrea Suisani sick...@opinioni.net wrote:

sure you're right.

It's just that my bet was on a higher throughput
when HT was isabled from the BIOS (as you stated
previously in this thread).


Yes, mine too. It's bizarre. If I were you, I'd look into it more
deeply. It may be a flaw in your test methodology (maybe you disabled
the wrong cores?). If not, it would be good to know why the extra TPS
to replicate elsewhere.


I'd recommend more synthetic benchmarks when trying to compare systems
like this.  bonnie++,


you were right. bonnie++ (-f -n 0 -c 4) show that there's very little (if any)
difference in terms of sequential input whether or not cache is enabled on the
RAID1 (SAS 15K, sdb).

I've run 2 bonnie++ test with both cache enabled and disabled and what I get
(see attachments for more details) it's a 400MB/s sequential input (cache) vs
390MBs (nocache).

I dunno why but I would have expected a higher delta (due to the 512MB cache)
not a mere 10MB/s, but this is only based on my gut feeling.

I've also tried to test RAID1 array where the OS is installed (2 SATA 7.2Krpm, 
sda)
just to verify if cache effect is comparable with the one I get from SAS disks.

Well it seems that there's no cache effects or if it's is there is so small as 
to be
confused with the noise.

Both array are configured with this params

Read Policy   : Adaptive Read Ahead
Write Policy  : Write Back
Stripe Element Size   : 64 KB
Disk Cache Policy : Disabled

those tests are performed with HT disable from the BIOS, but without
using noht kernel boot param. the scheduler for sdb was setted to deadline
while the default cfq for sda.


 the memory stream test that Greg Smith was
working on, and so on.


this one https://github.com/gregs1104/stream-scaling, right?

I've executed the test with HT enabled, HT disabled from the BIOS
and HT disable using sys interface. Attached 3 graphs and related
text files



Get an idea what core differences the machines
display under such testing.


I'm trying... hard :)

Andrea


Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   4 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cloud32000M   180829  23 112581  15   401611  19 917.7  
10
Latency 445ms 954ms   360ms   64788us

1.96,1.96,cloud,4,1350463530,32000M180829,23,112581,15,,,401611,19,917.7,10,,,445ms,954ms,,360ms,64788us,,
Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   4 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cloud32000M   179259  23 112093  15   400021  18 600.9  
26
Latency 670ms 514ms 99640us   86081us

1.96,1.96,cloud,4,1350465025,32000M179259,23,112093,15,,,400021,18,600.9,26,,,670ms,514ms,,99640us,86081us,,
Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   4 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cloud32000M   174212  22 108909  15   387598  18 928.9  
11
Latency 960ms1397ms   967ms   71495us

1.96,1.96,cloud,4,1350465002,32000M174212,22,108909,15,,,387598,18,928.9,11,,,960ms,1397ms,,967ms,71495us,,
Version  1.96   --Sequential Output-- --Sequential Input- --Random-
Concurrency   4 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
cloud32000M   175185  22 109229  15   391409  18 912.9  
10
Latency1002ms1225ms  1059ms   91373us

1.96,1.96,cloud,4,1350466614,32000M175185,22,109229,15,,,391409,18,912.9,10,,,1002ms,1225ms,,1059ms,91373us,,
attachment: stream-ht_disabled_bios.png=== CPU cache information ===   

  [288/809]
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu0 Level 1 Cache: 32K (Instruction)
CPU /sys/devices/system/cpu/cpu0 Level 2 Cache: 256K (Unified)
CPU /sys/devices/system/cpu/cpu0 Level 3 Cache: 15360K (Unified)
CPU /sys/devices/system/cpu/cpu1 Level 1 Cache: 32K (Data)
CPU /sys/devices/system/cpu/cpu1 Level 1 

Re: [PERFORM] Two identical systems, radically different performance

2012-10-17 Thread Scott Marlowe
On Wed, Oct 17, 2012 at 9:45 AM, Andrea Suisani sick...@opinioni.net wrote:
 On 10/15/2012 05:34 PM, Scott Marlowe wrote:
 I'd recommend more synthetic benchmarks when trying to compare systems
 like this.  bonnie++,


 you were right. bonnie++ (-f -n 0 -c 4) show that there's very little (if
 any)
 difference in terms of sequential input whether or not cache is enabled on
 the
 RAID1 (SAS 15K, sdb).

I'm mainly wanting to know the difference between the two systems, so
if you can run it on the old and new machine and compare that that's
the real test.

 I've run 2 bonnie++ test with both cache enabled and disabled and what I get
 (see attachments for more details) it's a 400MB/s sequential input (cache)
 vs
 390MBs (nocache).

 I dunno why but I would have expected a higher delta (due to the 512MB
 cache)
 not a mere 10MB/s, but this is only based on my gut feeling.

Well the sequential throughput doesn't really rely on caching.  It's
the random writes that benefit from caching, and the other things
(random reads and seq read/write) that indirectly benefit because the
random writes are so much faster that they no longer get in the way.
So mostly compare random access between the old and new machines and
look for differences there.
  the memory stream test that Greg Smith was
 working on, and so on.


 this one https://github.com/gregs1104/stream-scaling, right?

Yep.

 I've executed the test with HT enabled, HT disabled from the BIOS
 and HT disable using sys interface. Attached 3 graphs and related
 text files

Well it's pretty meh.  I'd like to see the older machine compared to
the newer one here tho.

 I'm trying... hard :)

You're doing great.  These problems take effort to sort out.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIKE op with B-Tree Index?

2012-10-17 Thread Merlin Moncure
On Tue, Oct 16, 2012 at 8:01 PM, Sam Wong s...@hellosam.net wrote:
 On Wednesday, October 17, 2012 4:30, Merlin Moncure wrote,

 On Tue, Oct 16, 2012 at 3:15 AM, Sam Wong s...@hellosam.net wrote:
  Hi communities,
 
  I am investigating a performance issue involved with LIKE '%' on
  an index in a complex query with joins.
 
  The problem boils down into this simple scenario---:
  Scenario
  My database locale is C, using UTF-8 encoding. I tested this on 9.1.6
 and 9.
  2.1.
 
  Q1.
  SELECT * FROM shipments WHERE shipment_id LIKE '12345678%'
 
  Q2.
  SELECT * FROM shipments WHERE shipment_id = '12345678' AND
  shipment_id  '12345679'
 
  ...snip...
 
  Question
  Is Q1 and Q2 equivalent? From what I see and the result they seems to
  be the same, or did I miss something? (Charset: C, Encoding: UTF-8) If
  they are equivalent, is that a bug of the planner?

 They are most certainly not equivalent.  What if the shipping_id is
 12345678Z?

 merlin

 But '12345678Z' is indeed = '12345678' AND  '12345679'. Just like 'apple'
  'apples'  'apply' in a dictionary.

Right -- I didn't visualize it properly.  Still, you're asking the
server to infer that since you're looking between to adjacent textual
characters range bounded [) it convert the 'between' to a partial
string search.   That hold up logically but probably isn't worth
spending cycles to do, particularly in cases of non-ascii mappable
unicode characters.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIKE op with B-Tree Index?

2012-10-17 Thread Sam Wong
 Moncure wrote on Thursday, October 18, 2012 1:45 
 On Tue, Oct 16, 2012 at 8:01 PM, Sam Wong s...@hellosam.net wrote:
  On Wednesday, October 17, 2012 4:30, Merlin Moncure wrote,
 
  On Tue, Oct 16, 2012 at 3:15 AM, Sam Wong s...@hellosam.net wrote:
   Hi communities,
  
   I am investigating a performance issue involved with LIKE '%'
   on an index in a complex query with joins.
  
   The problem boils down into this simple scenario---:
   Scenario
   My database locale is C, using UTF-8 encoding. I tested this on
   9.1.6
  and 9.
   2.1.
  
   Q1.
   SELECT * FROM shipments WHERE shipment_id LIKE '12345678%'
  
   Q2.
   SELECT * FROM shipments WHERE shipment_id = '12345678' AND
   shipment_id  '12345679'
  
   ...snip...
  
   Question
   Is Q1 and Q2 equivalent? From what I see and the result they seems
   to be the same, or did I miss something? (Charset: C, Encoding:
   UTF-8) If they are equivalent, is that a bug of the planner?
 
  They are most certainly not equivalent.  What if the shipping_id is
  12345678Z?
 
  merlin
 
  But '12345678Z' is indeed = '12345678' AND  '12345679'. Just like
'apple'
   'apples'  'apply' in a dictionary.
 
 Right -- I didn't visualize it properly.  Still, you're asking the server
to infer that
 since you're looking between to adjacent textual characters range bounded
[) it
 convert the 'between' to a partial
 string search.   That hold up logically but probably isn't worth
 spending cycles to do, particularly in cases of non-ascii mappable unicode
 characters.
 merlin

Postgresql did that already. Refer to the analyze result of Q1 and Q2, it
gives
Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id 
'12345679'::text))
(I also just realized they did it just now)

Yet, with additional Filter (ref Q1 analyze), it's surprisingly that it
estimates Q1 will have more rows that Q2.

FYI, I made a self-contained test case and submitted a bug #7610.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance