Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL

2015-04-01 Thread Ilya Kosmodemiansky
Hi Pietro,

On Wed, Apr 1, 2015 at 3:56 PM, Pietro Pugni pietro.pu...@gmail.com wrote:
 T420: went from 311seconds (default postgresql.conf) to 195seconds doing
 tuning adjustments over RAID, kernel and postgresql.conf;
 MacMini: 40seconds.

I'am afraid, the matter is, that PostgreSQL is not configured properly
(and so do operating system and probably controller, however
pg_test_fsync shows that things are not so bad there as with
postgresql.conf).

It is pretty useless to benchmark a database using out-of-the-box
configuration. You need at least configure shared memory related,
checkpoints-related and autovacuum-related settings. And as a first
step, please compare postgresql.conf on Mac and on the server:
sometimes (with some mac installers) default postgresql.conf can be
not the same as on server.

Best regards,
Ilya


-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow checkpoints

2015-03-18 Thread Ilya Kosmodemiansky
On Wed, Mar 18, 2015 at 12:58 PM, Steven Jones
steven.jones1...@outlook.com wrote:
 Yes BBU is on the controller; 1024Mb. It is a HP P410i controller, with write 
 caching turned on the controller; off on disk level.


vm.dirty_background_bytes=67108864 and vm.dirty_bytes=536870912 looks
resonable for 512MB BBU, you can calculate them for 1024 or
recalculate them for dirty_background_ratio

By the way, which kernel do you use?

 We don't at this stage have any graphs, but we will set it up over the next 
 24hrs at least.

Do not forget to have iostat statistics on them, at least latency,
%iowait and %util, such parameters are very helpful.

And I am always suspicious about zfs under heavy writes. It is
reliable and quite comfortable in terms of configuration, but for
speed ext4 or xfs with disabled barrier looks more reasonable


 --
 Ilya Kosmodemiansky,

 PostgreSQL-Consulting.com
 tel. +14084142500
 cell. +4915144336040
 i...@postgresql-consulting.com


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow checkpoints

2015-03-18 Thread Ilya Kosmodemiansky
On Wed, Mar 18, 2015 at 1:21 PM, Steven Jones
steven.jones1...@outlook.com wrote:
 #autovacuum = on# Enable autovacuum subprocess?  'on'
 #log_autovacuum_min_duration = -1   # -1 disables, 0 logs all actions and
 #autovacuum_max_workers = 3 # max number of autovacuum 
 subprocesses
 #autovacuum_naptime = 1min  # time between autovacuum runs
 autovacuum_vacuum_threshold = 500   # min number of row updates before
 autovacuum_analyze_threshold = 500  # min number of row updates before
 #autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before vacuum
 #autovacuum_analyze_scale_factor = 0.1  # fraction of table size before 
 analyze

If I were you, Ill use _scale_factor settings instead of threshold,
because it makes your autovacuum aggressive enough (you need it on
such workload) without firing too frequently (vacuuming has its
price).  autovacuum_vacuum_scale_factor = 0.01 and
autovacuum_analyze_scale_factor = 0.05 will be OK

And if you see all your autovacuum workers active all the time (more
than 80% of the time for example) it is a reason to increase
autovacuum_max_workers

-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow checkpoints

2015-03-18 Thread Ilya Kosmodemiansky
On Wed, Mar 18, 2015 at 12:21 PM, Steven Jones
steven.jones1...@outlook.com wrote:
   - typical DW - relatively constant periodic data loads - i.e. heavy write
   - we receive large CSV files ~ 5-10Gb every 15 minutes spread out across
 5-7 minutes
   - DB size is ~2.5Tb; rotating load of 30 days keeps the database stable

And an important addition: how your autovacuum is configured?


-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow checkpoints

2015-03-18 Thread Ilya Kosmodemiansky
Hi Steven,

On Wed, Mar 18, 2015 at 12:21 PM, Steven Jones
steven.jones1...@outlook.com wrote:
   - system is HP blade; 128Gb RAM, 2x 8-core, 12x 10k RPM RAID1+0 (database)

Have you BBU on your controller? And how your controller configured, I
mean cache mode, io mode, disk write cache mode. You have 15K SAS
(which form factor?) under WAL and 10K SAS under database, am I
correct?

 Full iostat/iotop, configuration, checkpoint stats, etc. are pasted below
 for completeness. Highlights are:
 checkpoint_segments=512
 shared_buffers=16GB
 checkpoint_timeout=15min
 checkpoint_completion_target=0.1

It looks like your checkpoint settings are a bit strange besides of
everything else. If you chose high value for checkpoint_segments, your
aim is to avoid checkpoints by timeout (or vice verse). If you have
checkpoint_segments=512, your checkpoint_timeout should be about
60min. And anyway - checkpoint_completion_target=0.9 or 0.7 in order
to spread disk load between checkpoints.



 ---
 sysctl settings for dirty pages

 vm.dirty_background_bytes = 0
 vm.dirty_background_ratio = 5
 vm.dirty_bytes = 0
 vm.dirty_expire_centisecs = 3000
 vm.dirty_ratio = 10
 vm.dirty_writeback_centisecs = 500

Values for this settings are really dependent of RAID (and BBU size).

And about further problem description: have you any graphical
representation of your % disc utilization?

Best regards,
Ilya

-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-15 Thread Ilya Kosmodemiansky



 On Mar 15, 2015, at 13:45, Josh Krupka jkru...@gmail.com wrote:
 Hmm that's definitely odd that it's swapping since it has plenty of free 
 memory at the moment.  Is it still under heavy load right now?  Has the 
 output of free consistently looked like that during your trouble times?

And it seems better to disable swapiness 


 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-15 Thread Ilya Kosmodemiansky
Hi!

What shows your pg_stat_bgwriter for one day? 


 On Mar 15, 2015, at 11:54, Robert Kaye r...@musicbrainz.org wrote:
 
 Hi!
 
 We at MusicBrainz have been having trouble with our Postgres install for the 
 past few days. I’ve collected all the relevant information here:
 
   http://blog.musicbrainz.org/2015/03/15/postgres-troubles/
 
 If anyone could provide tips, suggestions or other relevant advice for what 
 to poke at next, we would love it.
 
 Thanks!
 
 --
 
 --ruaok
 
 Robert Kaye -- r...@musicbrainz.org --http://musicbrainz.org
 


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-15 Thread Ilya Kosmodemiansky
On Sun, Mar 15, 2015 at 8:46 PM, Andres Freund and...@2ndquadrant.com wrote:
 That imo doesn't really have anything to do with it. The primary benefit
 of a BBU with writeback caching is accelerating (near-)synchronous
 writes. Like the WAL.

My point was, that having no proper raid controller (today bbu surely
needed for the controller to be a proper one) + heavy writes of any
kind, it is absolutely impossible to live with large shared_buffers
and without io problems.


 Greetings,

 Andres Freund

 --
  Andres Freund http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MusicBrainz postgres performance issues

2015-03-15 Thread Ilya Kosmodemiansky
On Sun, Mar 15, 2015 at 8:20 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-03-15 11:09:34 -0600, Scott Marlowe wrote:
 shared_mem of 12G is almost always too large. I'd drop it down to ~1G or so.

 I think that's a outdated wisdom, i.e. not generally true.

Quite agreed. With note, that proper configured controller with BBU is needed.


 A new enough kernel, a sane filesystem
 (i.e. not ext3) and sane checkpoint configuration takes care of most of
 the other disadvantages.

Most likely. And better to be sure that filesystem mounted without barrier.

And I agree with Scott - 64MB work mem AND max_connections = 500 is a
recipe for disaster. The problem could be in session mode of
pgbouncer. If you can work with transaction mode - do it.


Best regards,
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Server vacuuming the same table again and again

2014-04-25 Thread Ilya Kosmodemiansky
Hi Dmitry,

On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов skau...@gmail.com wrote:
 cancelled autovacuum and it seems to help.

 In the morning autovacuum was back. And then it finished and I gone to work.

Actually, thise two things are tightly bound and there is no chance to
avoid vacuum, you can only postpone it, this kind of work eventually
supposed to be done.

What you really need to do as a first thing - configure your
autovacuum aggressively enough and then mayde ionice autovacuum
instead of mission critical ckeckpointer or bgwriter.

Which exact values have you in the following settings:

 autovacuum_analyze_scale_factor
 autovacuum_analyze_threshold
 autovacuum_freeze_max_age
 autovacuum_max_workers
 autovacuum_naptime
 autovacuum_vacuum_cost_delay
 autovacuum_vacuum_cost_limit
 autovacuum_vacuum_scale_factor
 autovacuum_vacuum_threshold
 log_autovacuum_min_duration

?

Best regards, Ilya

 Best regards,
 Dmitriy Shalashov



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Server vacuuming the same table again and again

2014-04-25 Thread Ilya Kosmodemiansky
Dmitry,

How is you filesystem under database exactly mount? (mount -l) And
just in case, while increasing checkpoint_segments, better to increase
checkpoint_timeout, otherwise all checkpoints will be still frequent
because segment threshold will be never reached. You could monitor
your pg_stat_bgwriter to understand which type of checkpoint happens
more frequent.

On Fri, Apr 25, 2014 at 10:22 AM, Дмитрий Шалашов skau...@gmail.com wrote:
 I understand that autovacuum has to be done, but not right after previous
 autovacuum? And then again and again.

That is exactly what happen: your autovacuum is not aggresive enough
and that is why it runs constantly instead of doing it s job by small
portions.

you should try something like this:

 autovacuum  | on
 autovacuum_analyze_scale_factor | 0.05
 autovacuum_analyze_threshold| 5
 autovacuum_freeze_max_age   | 2
 autovacuum_max_workers  | 10 # set 10 for example and
then you could see - if they all working constantly, maybe you need
more. or less if not.
 autovacuum_multixact_freeze_max_age | 4
 autovacuum_naptime  | 1
 autovacuum_vacuum_cost_delay| 10
 autovacuum_vacuum_cost_limit| -1
 autovacuum_vacuum_scale_factor  | 0.01
 autovacuum_vacuum_threshold | 10
 log_autovacuum_min_duration | -1


Best regards, Ilya

 Best regards,
 Dmitriy Shalashov


 2014-04-25 12:12 GMT+04:00 Ilya Kosmodemiansky
 ilya.kosmodemian...@postgresql-consulting.com:

 Hi Dmitry,

 On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов skau...@gmail.com
 wrote:
  cancelled autovacuum and it seems to help.

  In the morning autovacuum was back. And then it finished and I gone to
  work.

 Actually, thise two things are tightly bound and there is no chance to
 avoid vacuum, you can only postpone it, this kind of work eventually
 supposed to be done.

 What you really need to do as a first thing - configure your
 autovacuum aggressively enough and then mayde ionice autovacuum
 instead of mission critical ckeckpointer or bgwriter.

 Which exact values have you in the following settings:

  autovacuum_analyze_scale_factor
  autovacuum_analyze_threshold
  autovacuum_freeze_max_age
  autovacuum_max_workers
  autovacuum_naptime
  autovacuum_vacuum_cost_delay
  autovacuum_vacuum_cost_limit
  autovacuum_vacuum_scale_factor
  autovacuum_vacuum_threshold
  log_autovacuum_min_duration

 ?

 Best regards, Ilya
 
  Best regards,
  Dmitriy Shalashov



 --
 Ilya Kosmodemiansky,

 PostgreSQL-Consulting.com
 tel. +14084142500
 cell. +4915144336040
 i...@postgresql-consulting.com





-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Server vacuuming the same table again and again

2014-04-25 Thread Ilya Kosmodemiansky
Dmitry,

On Fri, Apr 25, 2014 at 11:31 AM, Дмитрий Шалашов skau...@gmail.com wrote:
 Thanks, we'll try new autovacuum settings!

I think things with vacuum will be much better.

If not, try to find out if you have long running transaction (several
minutes or more) and try to avoid such them.


 First, I have to say that load comes and go in waves - we don't yet
 understood why.
 All new waves have that behaviour - free disk, idle cpu.
 First such wave was before checkpoit_segments change, next waves after.

That could be a complicate problem caused by many things from suboptimal
sql-queries to network issues, could be not easy to guess.

- how many locks you have during the wave in comparison with normal workload?
- do you use some connection pooling (pgbouncer etc)?
- how about long running transactions I have mentioned above?
- are you using pg_stat_statements or any other method for detecting
slow queries?



 Best regards,
 Dmitriy Shalashov


 2014-04-25 13:22 GMT+04:00 Ilya Kosmodemiansky
 ilya.kosmodemian...@postgresql-consulting.com:

 On Fri, Apr 25, 2014 at 10:29 AM, Дмитрий Шалашов skau...@gmail.com
 wrote:
  Previously during load disk was 100% busy; now we have around 100 active
  state queries, 100% loaded proc, but disk is virtually idle...

 That was happen after changing checkpoit_segments setting?

 
 
  Best regards,
  Dmitriy Shalashov
 
 
  2014-04-25 12:22 GMT+04:00 Дмитрий Шалашов skau...@gmail.com:
 
  Hi Ilya!
 
   Actually, thise two things are tightly bound and there is no chance
   to
   avoid vacuum, you can only postpone it, this kind of work eventually
   supposed to be done.
 
  I understand that autovacuum has to be done, but not right after
  previous
  autovacuum? And then again and again.
  And after cancelling that first autovacuum I started another one by
  hand;
  from there no autovacuum was cancelled.
 
   ionice autovacuum instead of mission critical ckeckpointer or
   bgwriter
  Yeah, that was desperate. I restarted server when I had a chance - to
  drop
  my ionice settings back to defaults.
 
   Which exact values have you in the following settings:
 
  autovacuum_analyze_scale_factor = 0.1
  autovacuum_analyze_threshold = 50
  autovacuum_freeze_max_age = 2
  autovacuum_max_workers = 3
  autovacuum_naptime = 60
  autovacuum_vacuum_cost_delay = 20
  autovacuum_vacuum_cost_limit = -1
  autovacuum_vacuum_scale_factor = 0.2
  autovacuum_vacuum_threshold = 50
  log_autovacuum_min_duration = 0
 
  All defaults except last one I believe.
 
 
  Minwhile I noticed in the night logs:
  checkpoints are occurring too frequently (138 seconds apart)
  Consider increasing the configuration parameter checkpoint_segments.
 
  Increased checkpoint_segments to 256 and reloaded config.
 
 
  Best regards,
  Dmitriy Shalashov
 
 
  2014-04-25 12:12 GMT+04:00 Ilya Kosmodemiansky
  ilya.kosmodemian...@postgresql-consulting.com:
 
  Hi Dmitry,
 
  On Fri, Apr 25, 2014 at 9:47 AM, Дмитрий Шалашов skau...@gmail.com
  wrote:
   cancelled autovacuum and it seems to help.
 
   In the morning autovacuum was back. And then it finished and I gone
   to
   work.
 
  Actually, thise two things are tightly bound and there is no chance to
  avoid vacuum, you can only postpone it, this kind of work eventually
  supposed to be done.
 
  What you really need to do as a first thing - configure your
  autovacuum aggressively enough and then mayde ionice autovacuum
  instead of mission critical ckeckpointer or bgwriter.
 
  Which exact values have you in the following settings:
 
   autovacuum_analyze_scale_factor
   autovacuum_analyze_threshold
   autovacuum_freeze_max_age
   autovacuum_max_workers
   autovacuum_naptime
   autovacuum_vacuum_cost_delay
   autovacuum_vacuum_cost_limit
   autovacuum_vacuum_scale_factor
   autovacuum_vacuum_threshold
   log_autovacuum_min_duration
 
  ?
 
  Best regards, Ilya
  
   Best regards,
   Dmitriy Shalashov
 
 
 
  --
  Ilya Kosmodemiansky,
 
  PostgreSQL-Consulting.com
  tel. +14084142500
  cell. +4915144336040
  i...@postgresql-consulting.com
 
 
 



 --
 Ilya Kosmodemiansky,

 PostgreSQL-Consulting.com
 tel. +14084142500
 cell. +4915144336040
 i...@postgresql-consulting.com





-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PGSQL, checkpoints, and file system syncs

2014-04-04 Thread Ilya Kosmodemiansky
Hi Reza,

vm.dirty_bytes indeed makes sense, but just in case: how exactly is
your ext4 mount? Particularly, have you disabled barrier?

Ilya

On Thu, Apr 3, 2014 at 8:11 PM, Reza Taheri rtah...@vmware.com wrote:
 Try setting the vm.dirty_bytes sysctl. Something like 256MB might be a good
 starting point.



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why shared_buffers max is 8GB?

2014-03-26 Thread Ilya Kosmodemiansky
Hi Alexey,

On Wed, Mar 26, 2014 at 1:21 PM, Alexey Vasiliev leopard...@inbox.ru wrote:
 I read from several sources, what maximum shared_buffers is 8GB.

I believe that was an issue on some older versions, and thats why was
mentioned in several talks. Today it is a sort of apocrypha.

 Does this true? If yes, why exactly this number is maximum number of
 shared_buffers for good performance (on Linux 64-bits)?

25% of available RAM is a good idea to start. Sometimes, if you have
heavy workload _and_ it is possible to reside whole database in
memory, better to use something larger, about  ~75% of RAM.

Best regards,
Ilya
-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why shared_buffers max is 8GB?

2014-03-26 Thread Ilya Kosmodemiansky
On Wed, Mar 26, 2014 at 5:14 PM, Shaun Thomas stho...@optionshouse.com wrote:
 * Checkpoints must commit dirty shared buffers to disk. The larger this is, 
 the more risk you have when checkpoints come, up to and including an 
 unresponsive database. Writing to disks isn't free, and sadly this is still 
 on the slower side unless all of your storage is SSD-based. You don't want to 
 set this too much higher than your disk write cache.

We use on some heavy working machines 48GB of shared buffers (and
sometimes more - depends on amount of RAM). Of course that works only
with good enough hardware raid with large bbu, well tuned linux (dirty
bytes appropriate to raid cache size etc) and aggressively tuned both
checkpoints and background writer:

 bgwriter_delay  | 10
 bgwriter_lru_maxpages   | 1000
 bgwriter_lru_multiplier | 10
checkpoint_completion_target | 0.9
 checkpoint_segments  | 300
 checkpoint_timeout   | 3600

and it really makes sense


-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow join not using index properly

2014-03-25 Thread Ilya Kosmodemiansky
:
 23558kB
Buffers: shared hit=3991
-  Seq Scan on collection_data
 (cost=0.00..10020.40 rows=602940 width=8) (actual time=0.006..146.447
 rows=603066 loops=1)
  Buffers: shared hit=3991
  Total runtime: 27854.200 ms


 I also tried including the MIN( a1.context_key ) in the first select
 statement as you had written it, but upon doing that it became necessary to
 add a GROUP BY clause, and doing that changed the final number of rows
 selected:

 ERROR:  column a1.ancestor_key must appear in the GROUP BY clause or be
 used in an aggregate function
 LINE 4:   min( a1.context_key ), ancestor_key

  ^

 Including the LIMIT 1 at the end of the first select statement gave a syntax
 error that I couldn't seem to get past, so I think it might be invalid:

 ERROR:  syntax error at or near UNION
 LINE 8: UNION (
 ^


 So I landed on the version that I posted above, which seems to select the
 same set in all of the cases that I tried.

 Anyway, thanks again for taking a stab at helping, I do appreciate it. If
 you have any other ideas that might be of help I'd certainly be happy to
 hear them.

 Take care,
  /Stefan




 On Thu, Mar 20, 2014 at 11:02 PM, Ilya Kosmodemiansky
 ilya.kosmodemian...@postgresql-consulting.com wrote:

 Hi Stefan!

 Probably you need to rewrite your query like this (check it first):

 with RECURSIVE qq(cont_key, anc_key) as
 (
 select min(a1.context_key), ancestor_key from virtual_ancestors a1
  union select
   (SELECT
 a1.context_key, ancestor_key
   FROM
 virtual_ancestors a1 where context_key  cont_key order by
 context_key limit 1) from qq where cont_key is not null
 )
 select a1.cont_key
  from qq a1, collection_data, virtual_ancestors a2
 WHERE
 a1.anc_key =  collection_data.context_key
 AND collection_data.collection_context_key = a2.context_key
 AND a2.ancestor_key = ?

 best regards,
 Ilya

 On Fri, Mar 21, 2014 at 12:56 AM, Stefan Amshey srams...@gmail.com
 wrote:
  We have a slow performing query that we are trying to improve, and it
  appears to be performing a sequential scan at a point where it should be
  utilizing an index. Can anyone tell me why postgres is opting to do it
  this
  way?
 
  The original query is as follows:
 
  SELECT DISTINCT
  a1.context_key
  FROM
  virtual_ancestors a1, collection_data, virtual_ancestors a2
  WHERE
  a1.ancestor_key =  collection_data.context_key
  AND collection_data.collection_context_key = a2.context_key
  AND a2.ancestor_key = ?
 
  The key relationships should all using indexed columns, but the query
  plan
  that postgres comes up with ends up performing a sequential scan on the
  collection_data table (in this case about 602k rows) where we would have
  expected it to utilize the index:
 
   HashAggregate  (cost=60905.73..60935.73 rows=3000 width=4) (actual
  time=3366.165..3367.354 rows=3492 loops=1)
 Buffers: shared hit=16291 read=1222
 -  Nested Loop  (cost=17546.26..60898.23 rows=3000 width=4) (actual
  time=438.332..3357.918 rows=13037 loops=1)
   Buffers: shared hit=16291 read=1222
   -  Hash Join  (cost=17546.26..25100.94 rows=98 width=4)
  (actual
  time=408.554..415.767 rows=2092 loops=1)
 Hash Cond: (a2.context_key =
  collection_data.collection_context_key)
 Buffers: shared hit=4850 read=3
 -  Index Only Scan using virtual_ancestors_pkey on
  virtual_ancestors a2  (cost=0.00..233.32 rows=270 width=4) (actual
  time=8.532..10.703 rows=1960 loops=1)
   Index Cond: (ancestor_key = 1072173)
   Heap Fetches: 896
   Buffers: shared hit=859 read=3
 -  Hash  (cost=10015.56..10015.56 rows=602456 width=8)
  (actual time=399.708..399.708 rows=602570 loops=1)
   Buckets: 65536  Batches: 1  Memory Usage: 23538kB
   Buffers: shared hit=3991
   sequential scan occurs here ##
   -  Seq Scan on collection_data
  (cost=0.00..10015.56
  rows=602456 width=8) (actual time=0.013..163.509 rows=602570 loops=1)
 Buffers: shared hit=3991
   -  Index Only Scan using virtual_ancestors_pkey on
  virtual_ancestors a1  (cost=0.00..360.70 rows=458 width=8) (actual
  time=1.339..1.403 rows=6 loops=2092)
 Index Cond: (ancestor_key = collection_data.context_key)
 Heap Fetches: 7067
 Buffers: shared hit=11441 read=1219
   Total runtime: 3373.058 ms
 
 
  The table definitions are as follows:
 
Table public.virtual_ancestors
  Column|   Type   | Modifiers
  --+--+---
   ancestor_key | integer  | not null
   context_key  | integer  | not null
   degree   | smallint | not null
  Indexes:
  virtual_ancestors_pkey PRIMARY KEY

Re: [PERFORM] pg_dump vs pg_basebackup

2014-03-25 Thread Ilya Kosmodemiansky
Hi gianfranco,


How exactly large is your database and how heavy is a workload on it?
Usually if you have more than ~200Gb, better to use pg_basebackup
because pg_dump will take too long time. And please take in mind, that
pg_dump makes dump, which is  actually not the same thing as a backup.

Best regards,
Ilya

On Tue, Mar 25, 2014 at 5:45 AM, gianfranco caca limpc...@yahoo.com wrote:
 Hai,

 Can anyone tell me the difference and performance between pgdump and
 pg_basebackup if I want to backup a large database.

 Thanks



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump vs pg_basebackup

2014-03-25 Thread Ilya Kosmodemiansky
Yes, you need to set recovery_target_time in your recovery.conf while
performing recovery
(http://www.postgresql.org/docs/9.3/static/recovery-target-settings.html).
That could be a tricky thing - depends on that exactly you need. All
those transactions, which were not committed at given timestamp, will
be rollbacked, so read url above carefully.

On Tue, Mar 25, 2014 at 8:19 AM, gianfranco caca limpc...@yahoo.com wrote:
 Hai ilya,

 Thanks for the respond. The database is estimated over 100gb and the
 workload will be high. Can we use a pg_basebackup with pitr to restore based
 on transaction time?

 Thanks


 On Tuesday, 25 March 2014, 15:13, Ilya Kosmodemiansky
 ilya.kosmodemian...@postgresql-consulting.com wrote:
 Hi gianfranco,


 How exactly large is your database and how heavy is a workload on it?
 Usually if you have more than ~200Gb, better to use pg_basebackup
 because pg_dump will take too long time. And please take in mind, that
 pg_dump makes dump, which is  actually not the same thing as a backup.

 Best regards,
 Ilya

 On Tue, Mar 25, 2014 at 5:45 AM, gianfranco caca limpc...@yahoo.com wrote:
 Hai,

 Can anyone tell me the difference and performance between pgdump and
 pg_basebackup if I want to backup a large database.

 Thanks




 --
 Ilya Kosmodemiansky,

 PostgreSQL-Consulting.com
 tel. +14084142500
 cell. +4915144336040
 i...@postgresql-consulting.com


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance






-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump vs pg_basebackup

2014-03-25 Thread Ilya Kosmodemiansky
Joshua,

that is really good point: an alternative is to use pg_basebackup
through ssh tunnel with compression, but rsync is much simpler.

On Tue, Mar 25, 2014 at 3:56 PM, Joshua D. Drake j...@commandprompt.com wrote:

 On 03/25/2014 05:05 AM, Claudio Freire wrote:


 On Tue, Mar 25, 2014 at 4:39 AM, David Johnston pol...@yahoo.com wrote:

 Hai,

 Can anyone tell me the difference and performance between pgdump and
 pg_basebackup if I want to backup a large database.


 Honestly,

 Neither is particularly good at backing up large databases. I would look
 into PITR with rsync.

 JD


 --
 Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
 PostgreSQL Support, Training, Professional Services and Development
 High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
 Political Correctness is for cowards.



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump vs pg_basebackup

2014-03-25 Thread Ilya Kosmodemiansky
Joshua,

On Tue, Mar 25, 2014 at 4:22 PM, Joshua D. Drake j...@commandprompt.com wrote:
The advantage is that you can create backups that don't
 have to be restored, just started. You can also use the differential
 portions of rsync to do it multiple times a day without much issue.

Are you sure, that it is a nice idea on a database with heavy write workload?

And also Im not sure, that differential backups using rsync will be
recoverable, if you have actually meant that.



 JD

 --
 Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
 PostgreSQL Support, Training, Professional Services and Development
 High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
 Political Correctness is for cowards.



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump vs pg_basebackup

2014-03-25 Thread Ilya Kosmodemiansky
Magnus,

That is correct, but I'am afraid that such all-in-one functionality
also hides from one how backup really works. Probably such sort of
knowledge is so essential for a DBA, that it is better to learn both
methods, at least to be able to choose correctly? But maybe it is a
rhetorical question.

On Tue, Mar 25, 2014 at 4:21 PM, Magnus Hagander mag...@hagander.net wrote:
 I would say that's the one thing that rsync is *not*. pg_basebackup takes
 care of a lot of things under the hood. rsync is a lot more complicated, in
 particular in failure scenarios, since you have to manually deal with
 pg_start/stop_backup().

 There are definitely reasons you'd prefer rsync over pg_basebackup, but I
 don't believe simplicity is one of them.

 //Magnus


 On Tue, Mar 25, 2014 at 4:18 PM, Ilya Kosmodemiansky
 ilya.kosmodemian...@postgresql-consulting.com wrote:

 Joshua,

 that is really good point: an alternative is to use pg_basebackup
 through ssh tunnel with compression, but rsync is much simpler.

 On Tue, Mar 25, 2014 at 3:56 PM, Joshua D. Drake j...@commandprompt.com
 wrote:
 
  On 03/25/2014 05:05 AM, Claudio Freire wrote:
 
 
  On Tue, Mar 25, 2014 at 4:39 AM, David Johnston pol...@yahoo.com
  wrote:
 
  Hai,
 
  Can anyone tell me the difference and performance between pgdump and
  pg_basebackup if I want to backup a large database.
 
 
  Honestly,
 
  Neither is particularly good at backing up large databases. I would look
  into PITR with rsync.
 
  JD
 
 
  --
  Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
  PostgreSQL Support, Training, Professional Services and Development
  High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
  Political Correctness is for cowards.
 
 
 
  --
  Sent via pgsql-performance mailing list
  (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance



 --
 Ilya Kosmodemiansky,

 PostgreSQL-Consulting.com
 tel. +14084142500
 cell. +4915144336040
 i...@postgresql-consulting.com


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump vs pg_basebackup

2014-03-25 Thread Ilya Kosmodemiansky
OK, agreed. Ive got your point;-)

On Tue, Mar 25, 2014 at 4:40 PM, Magnus Hagander mag...@hagander.net wrote:
 Oh, I agree it's good that you should know both methods. I only disagree
 with that the choice of rsync be made with the argument of simplicity.
 Simplicity is one of the main reasons to choose the *other* method
 (pg_basebackup), and the rsync method is for more advanced usecases. But
 it's definitely good to know both!

 //Magnus



 On Tue, Mar 25, 2014 at 4:37 PM, Ilya Kosmodemiansky
 ilya.kosmodemian...@postgresql-consulting.com wrote:

 Magnus,

 That is correct, but I'am afraid that such all-in-one functionality
 also hides from one how backup really works. Probably such sort of
 knowledge is so essential for a DBA, that it is better to learn both
 methods, at least to be able to choose correctly? But maybe it is a
 rhetorical question.

 On Tue, Mar 25, 2014 at 4:21 PM, Magnus Hagander mag...@hagander.net
 wrote:
  I would say that's the one thing that rsync is *not*. pg_basebackup
  takes
  care of a lot of things under the hood. rsync is a lot more complicated,
  in
  particular in failure scenarios, since you have to manually deal with
  pg_start/stop_backup().
 
  There are definitely reasons you'd prefer rsync over pg_basebackup, but
  I
  don't believe simplicity is one of them.
 
  //Magnus
 
 
  On Tue, Mar 25, 2014 at 4:18 PM, Ilya Kosmodemiansky
  ilya.kosmodemian...@postgresql-consulting.com wrote:
 
  Joshua,
 
  that is really good point: an alternative is to use pg_basebackup
  through ssh tunnel with compression, but rsync is much simpler.
 
  On Tue, Mar 25, 2014 at 3:56 PM, Joshua D. Drake j...@commandprompt.com
  wrote:
  
   On 03/25/2014 05:05 AM, Claudio Freire wrote:
  
  
   On Tue, Mar 25, 2014 at 4:39 AM, David Johnston pol...@yahoo.com
   wrote:
  
   Hai,
  
   Can anyone tell me the difference and performance between pgdump
   and
   pg_basebackup if I want to backup a large database.
  
  
   Honestly,
  
   Neither is particularly good at backing up large databases. I would
   look
   into PITR with rsync.
  
   JD
  
  
   --
   Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
   PostgreSQL Support, Training, Professional Services and Development
   High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
   Political Correctness is for cowards.
  
  
  
   --
   Sent via pgsql-performance mailing list
   (pgsql-performance@postgresql.org)
   To make changes to your subscription:
   http://www.postgresql.org/mailpref/pgsql-performance
 
 
 
  --
  Ilya Kosmodemiansky,
 
  PostgreSQL-Consulting.com
  tel. +14084142500
  cell. +4915144336040
  i...@postgresql-consulting.com
 
 
  --
  Sent via pgsql-performance mailing list
  (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 
 
 
 
  --
   Magnus Hagander
   Me: http://www.hagander.net/
   Work: http://www.redpill-linpro.com/



 --
 Ilya Kosmodemiansky,

 PostgreSQL-Consulting.com
 tel. +14084142500
 cell. +4915144336040
 i...@postgresql-consulting.com




 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow join not using index properly

2014-03-21 Thread Ilya Kosmodemiansky
 persists despite re-clustering the index.

 We are using PostgreSQL 9.2.5 on x86_64-unknown-linux-gnu, compiled by gcc
 (Debian 4.4.5-8) 4.4.5, 64-bit

 Interestingly, on an instance running PostgreSQL 9.2.4 on
 x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.4.5-8) 4.4.5, 64-bit
 where I copied the 2 tables over to a temporary database, the plan comes out
 differently:

  HashAggregate  (cost=39692.03..39739.98 rows=4795 width=4) (actual
 time=73.285..75.141 rows=3486 loops=1)
Buffers: shared hit=22458
-  Nested Loop  (cost=0.00..39680.05 rows=4795 width=4) (actual
 time=0.077..63.116 rows=13007 loops=1)
  Buffers: shared hit=22458
  -  Nested Loop  (cost=0.00..32823.38 rows=164 width=4) (actual
 time=0.056..17.685 rows=2084 loops=1)
Buffers: shared hit=7529
-  Index Only Scan using virtual_ancestors_pkey on
 virtual_ancestors a2  (cost=0.00..1220.85 rows=396 width=4) (actual
 time=0.025..2.732 rows=1954 loops=1)
  Index Cond: (ancestor_key = 1072173)
  Heap Fetches: 1954
  Buffers: shared hit=1397
  Note the index scan here - this is what it SHOULD be doing
 ##
-  Index Scan using collection_data_context_key_index on
 collection_data  (cost=0.00..79.24 rows=56 width=8) (actual
 time=0.004..0.005 rows=1 loops=1954)
  Index Cond: (collection_context_key = a2.context_key)
  Buffers: shared hit=6132
  -  Index Only Scan using virtual_ancestors_pkey on
 virtual_ancestors a1  (cost=0.00..35.40 rows=641 width=8) (actual
 time=0.007..0.015 rows=6 loops=2084)
Index Cond: (ancestor_key = collection_data.context_key)
Heap Fetches: 13007
Buffers: shared hit=14929
  Total runtime: 76.431 ms

 Why can't I get the Postgres 9.2.5 instance to use the optimal plan?

 Thanks in advance!
  /Stefan

 --
 -
 Stefan Amshey



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
i...@postgresql-consulting.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problem with ExclusiveLock on inserts

2014-02-19 Thread Ilya Kosmodemiansky
Vladimir,

pgbouncer works with pl/proxy in transaction pooling mode. A wide spread
phrase that statement mode is for plproxy does not mean any limitations for
transaction pooling mode until you have atocommit on client. Anyway, try to
reduce connections.

try to set your autovacuum a bit more aggressive:


 autovacuum_analyze_scale_factor=0.05 #or like that
 autovacuum_analyze_threshold=5
 autovacuum_freeze_max_age=2
 autovacuum_max_workers=20 # that is fine for slow disks
 autovacuum_naptime=1
 autovacuum_vacuum_cost_delay=5 # or at least 10
 autovacuum_vacuum_cost_limit =-1
 autovacuum_vacuum_scale_factor=0.01 # this setting is to be really
aggressive, otherwise you simply postpone huge vacuums and related disk io,
smaller portions are better
 autovacuum_vacuum_threshold=20

probably you will also need some ionice for autovacuum workers



On Thu, Feb 13, 2014 at 11:26 AM, Бородин Владимир r...@simply.name wrote:


 13.02.2014, в 13:29, Ilya Kosmodemiansky hydrobi...@gmail.com
 написал(а):

 Vladimir,

 And, any effect on your problem?


 It worked without problems longer than previous configuration but repeated
 again several minutes ago :(


 On Thu, Feb 13, 2014 at 9:35 AM, Бородин Владимир r...@simply.name
 wrote:

 I have limited max connections to 1000, reduced shared buffers to 8G and
 restarted postgres.


 1000 is still to much in most cases. With pgbouncer in transaction
 pooling mode normaly pool size 8-32, max_connections = 100 (default
 value) and client_connections 500-1500 looks more reasonable.


 Clients for this db are plproxy hosts. As far as I know plproxy can work
 only with statement pooling.



 I have also noticed that this big tables stopped vacuuming automatically a
 couple of weeks ago. It could be the reason of the problem, I will now try
 to tune autovacuum parameters to turn it back. But yesterday I ran vacuum
 analyze for all relations manually but that did not help.


 How do your autovacuum parameters look like now?


 They were all default except for vacuum_defer_cleanup_age = 10. I have
 increased autovacuum_max_workers = 20 because I have 10 databases with
 about 10 tables each. That did not make better (I haven't seen more than
 two auto vacuum workers simultaneously). Then I have tried to
 set vacuum_cost_limit = 1000. Still not vacuuming big tables. Right now the
 parameters look like this:

 root@rpopdb01e ~ # fgrep vacuum
 /var/lib/pgsql/9.3/data/conf.d/postgresql.conf
 #vacuum_cost_delay = 0  # 0-100 milliseconds
 #vacuum_cost_page_hit = 1   # 0-1 credits
 #vacuum_cost_page_miss = 10 # 0-1 credits
 #vacuum_cost_page_dirty = 20# 0-1 credits
 vacuum_cost_limit = 1000# 1-1 credits
 vacuum_defer_cleanup_age = 10   # number of xacts by which cleanup
 is delayed
 autovacuum = on # Enable autovacuum subprocess?
  'on'
 log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions
 and
 autovacuum_max_workers = 20 # max number of autovacuum
 subprocesses
 #autovacuum_naptime = 1min  # time between autovacuum runs
 #autovacuum_vacuum_threshold = 50   # min number of row updates before
 # vacuum
 #autovacuum_analyze_threshold = 50  # min number of row updates before
 #autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before
 vacuum
 #autovacuum_analyze_scale_factor = 0.1  # fraction of table size before
 analyze
 #autovacuum_freeze_max_age = 2  # maximum XID age before forced
 vacuum
 #autovacuum_vacuum_cost_delay = 20ms# default vacuum cost delay for
 # autovacuum, in milliseconds;
 # -1 means use vacuum_cost_delay
 #autovacuum_vacuum_cost_limit = -1  # default vacuum cost limit for
 # autovacuum, -1 means use
 # vacuum_cost_limit
 #vacuum_freeze_min_age = 5000
 #vacuum_freeze_table_age = 15000
 root@rpopdb01e ~ #


 13.02.2014, в 0:14, Ilya Kosmodemiansky hydrobi...@gmail.com написал(а):

 On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир r...@simply.name
 wrote:


 Yes, this is legacy, I will fix it. We had lots of inactive connections
 but right now we use pgbouncer for this. When the workload is normal we
 have some kind of 80-120 backends. Less than 10 of them are in active
 state. Having problem with locks we get lots of sessions (sometimes more
 than 1000 of them are in active state). According to vmstat the number of
 context switches is not so big (less than 20k), so I don't think it is the
 main reason. Yes, it can aggravate the problem, but imho not create it.



 I'am afraid that is the problem. More than 1000 backends, most of them
 are simply waiting.



 I don't understand the correlation of shared buffers size and
 synchronous_commit. Could you please

Re: [PERFORM] Problem with ExclusiveLock on inserts

2014-02-13 Thread Ilya Kosmodemiansky
Vladimir,

And, any effect on your problem?

On Thu, Feb 13, 2014 at 9:35 AM, Бородин Владимир r...@simply.name wrote:
 I have limited max connections to 1000, reduced shared buffers to 8G and 
 restarted postgres.

1000 is still to much in most cases. With pgbouncer in transaction
pooling mode normaly pool size 8-32, max_connections = 100 (default
value) and client_connections 500-1500 looks more reasonable.


 I have also noticed that this big tables stopped vacuuming automatically a 
 couple of weeks ago. It could be the reason of the problem, I will now try to 
 tune autovacuum parameters to turn it back. But yesterday I ran vacuum 
 analyze for all relations manually but that did not help.

How do your autovacuum parameters look like now?

 13.02.2014, в 0:14, Ilya Kosmodemiansky hydrobi...@gmail.com написал(а):

 On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир r...@simply.name wrote:


 Yes, this is legacy, I will fix it. We had lots of inactive connections but 
 right now we use pgbouncer for this. When the workload is normal we have some 
 kind of 80-120 backends. Less than 10 of them are in active state. Having 
 problem with locks we get lots of sessions (sometimes more than 1000 of them 
 are in active state). According to vmstat the number of context switches is 
 not so big (less than 20k), so I don't think it is the main reason. Yes, it 
 can aggravate the problem, but imho not create it.



 I'am afraid that is the problem. More than 1000 backends, most of them
 are simply waiting.



 I don't understand the correlation of shared buffers size and 
 synchronous_commit. Could you please explain your statement?



 You need to fsync your huge shared buffers any time your database
 performs checkpoint. By default it usually happens too often because
 checkpoint_timeout is 5min by default. Without bbu, on software raid
 that leads to io spike and you commit waits for wal.



 12.02.2014, в 23:37, Ilya Kosmodemiansky hydrobi...@gmail.com написал(а):

 another thing which is arguable - concurrency degree. How many of your 
 max_connections = 4000 are actually running?  4000 definitely looks like an 
 overkill and they could be a serious source of concurrency, especially then 
 you have had barrier enabled and software raid.

 Plus for 32Gb of shared buffers with synchronous_commit = on especially on 
 heavy workload one should definitely have bbu, otherwise performance will be 
 poor.


 On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир r...@simply.name wrote:


 Oh, I haven't thought about barriers, sorry. Although I use soft raid without 
 batteries I have turned barriers off on one cluster shard to try.

 root@rpopdb01e ~ # mount | fgrep data
 /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
 root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
 root@rpopdb01e ~ # mount | fgrep data
 /dev/md2 on /var/lib/pgsql/9.3/data type ext4 
 (rw,noatime,nodiratime,nobarrier)
 root@rpopdb01e ~ #

 12.02.2014, в 21:56, Ilya Kosmodemiansky hydrobi...@gmail.com написал(а):

 My question was actually about barrier option, by default it is enabled on 
 RHEL6/ext4 and could cause serious bottleneck on io before disks are actually 
 involved. What says mount without arguments?

 On Feb 12, 2014, at 18:43, Бородин Владимир r...@simply.name wrote:

 root@rpopdb01e ~ # fgrep data /etc/fstab
 UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 
 noatime,nodiratime 0 1
 root@rpopdb01e ~ #

 According to iostat the disks are not the bottleneck.

 12.02.2014, в 21:30, Ilya Kosmodemiansky hydrobi...@gmail.com написал(а):

 Hi Vladimir,

 Just in case: how is your ext4 mount?

 Best regards,
 Ilya

 On Feb 12, 2014, at 17:59, Бородин Владимир r...@simply.name wrote:

 Hi all.

 Today I have started getting errors like below in logs (seems that I have not 
 changed anything for last week). When it happens the db gets lots of 
 connections in state active, eats 100% cpu and clients get errors (due to 
 timeout).

 2014-02-12 15:44:24.562 
 MSK,rpop,rpopdb_p6,30061,localhost:58350,52fb5e53.756d,1,SELECT 
 waiting,2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,0,process 
 30061 still waiting for ExclusiveLock on extension of relation 26118 of 
 database 24590 after 1000.082 ms,SQL statement insert into 
 rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)

 I have read several topics [1, 2, 3, 4] with similar problems but haven't 
 find a good solution. Below is some more diagnostics.

 I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host 
 is running with the following CPU (32 cores) and memory:

 root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
 model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
 root@rpopdb01e ~ # free -m
 total   used   free sharedbuffers cached
 Mem:129028 123558   5469  0135 119504
 -/+ buffers/cache:   3918 125110
 Swap

Re: [PERFORM] Problem with ExclusiveLock on inserts

2014-02-13 Thread Ilya Kosmodemiansky
Vladimir,

pgbouncer works with pl/proxy in transaction pooling mode. A wide spread
phrase that statement mode is for plproxy does not mean any limitations for
transaction pooling mode until you have atocommit on client. Anyway, try to
reduce connections.

try to set your autovacuum a bit more aggressive:


 autovacuum_analyze_scale_
factor=0.05 #or like that
 autovacuum_analyze_threshold=5
 autovacuum_freeze_max_age=2
 autovacuum_max_workers=20 # that is fine for slow disks
 autovacuum_naptime=1
 autovacuum_vacuum_cost_delay=5 # or at least 10
 autovacuum_vacuum_cost_limit =-1
 autovacuum_vacuum_scale_factor=0.01 # this setting is to be really
aggressive, otherwise you simply postpone huge vacuums and related disk io,
smaller portions are better
 autovacuum_vacuum_threshold=20

probably you will also need some ionice for autovacuum workers


On Thu, Feb 13, 2014 at 1:13 PM, Ilya Kosmodemiansky 
ilya.kosmodemian...@postgresql-consulting.com wrote:

 Vladimir,

 pgbouncer works with pl/proxy in transaction pooling mode. A wide spread
 phrase that statement mode is for plproxy does not mean any limitations for
 transaction pooling mode until you have atocommit on client. Anyway, try to
 reduce connections.

 try to set your autovacuum a bit more aggressive:


  autovacuum_analyze_scale_factor=0.05 #or like that
  autovacuum_analyze_threshold=5
  autovacuum_freeze_max_age=2
  autovacuum_max_workers=20 # that is fine for slow disks
  autovacuum_naptime=1
  autovacuum_vacuum_cost_delay=5 # or at least 10
  autovacuum_vacuum_cost_limit =-1
  autovacuum_vacuum_scale_factor=0.01 # this setting is to be really
 aggressive, otherwise you simply postpone huge vacuums and related disk io,
 smaller portions are better
  autovacuum_vacuum_threshold=20

 probably you will also need some ionice for autovacuum workers



 On Thu, Feb 13, 2014 at 11:26 AM, Бородин Владимир r...@simply.namewrote:


 13.02.2014, в 13:29, Ilya Kosmodemiansky hydrobi...@gmail.com
 написал(а):

 Vladimir,

 And, any effect on your problem?


 It worked without problems longer than previous configuration but
 repeated again several minutes ago :(


 On Thu, Feb 13, 2014 at 9:35 AM, Бородин Владимир r...@simply.name
 wrote:

 I have limited max connections to 1000, reduced shared buffers to 8G and
 restarted postgres.


 1000 is still to much in most cases. With pgbouncer in transaction
 pooling mode normaly pool size 8-32, max_connections = 100 (default
 value) and client_connections 500-1500 looks more reasonable.


 Clients for this db are plproxy hosts. As far as I know plproxy can work
 only with statement pooling.



 I have also noticed that this big tables stopped vacuuming automatically
 a couple of weeks ago. It could be the reason of the problem, I will now
 try to tune autovacuum parameters to turn it back. But yesterday I ran
 vacuum analyze for all relations manually but that did not help.


 How do your autovacuum parameters look like now?


 They were all default except for vacuum_defer_cleanup_age = 10. I
 have increased autovacuum_max_workers = 20 because I have 10 databases with
 about 10 tables each. That did not make better (I haven't seen more than
 two auto vacuum workers simultaneously). Then I have tried to
 set vacuum_cost_limit = 1000. Still not vacuuming big tables. Right now the
 parameters look like this:

 root@rpopdb01e ~ # fgrep vacuum
 /var/lib/pgsql/9.3/data/conf.d/postgresql.conf
 #vacuum_cost_delay = 0  # 0-100 milliseconds
 #vacuum_cost_page_hit = 1   # 0-1 credits
 #vacuum_cost_page_miss = 10 # 0-1 credits
 #vacuum_cost_page_dirty = 20# 0-1 credits
 vacuum_cost_limit = 1000# 1-1 credits
 vacuum_defer_cleanup_age = 10   # number of xacts by which
 cleanup is delayed
 autovacuum = on # Enable autovacuum subprocess?
  'on'
 log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions
 and
 autovacuum_max_workers = 20 # max number of autovacuum
 subprocesses
 #autovacuum_naptime = 1min  # time between autovacuum runs
 #autovacuum_vacuum_threshold = 50   # min number of row updates before
 # vacuum
 #autovacuum_analyze_threshold = 50  # min number of row updates before
 #autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before
 vacuum
 #autovacuum_analyze_scale_factor = 0.1  # fraction of table size before
 analyze
 #autovacuum_freeze_max_age = 2  # maximum XID age before forced
 vacuum
 #autovacuum_vacuum_cost_delay = 20ms# default vacuum cost delay for
 # autovacuum, in milliseconds;
 # -1 means use vacuum_cost_delay
 #autovacuum_vacuum_cost_limit = -1  # default vacuum cost limit for
 # autovacuum, -1 means use

Re: [PERFORM] Problem with ExclusiveLock on inserts

2014-02-12 Thread Ilya Kosmodemiansky
Hi Vladimir,

Just in case: how is your ext4 mount? 

Best regards, 
Ilya

 On Feb 12, 2014, at 17:59, Бородин Владимир r...@simply.name wrote:
 
 Hi all.
 
 Today I have started getting errors like below in logs (seems that I have not 
 changed anything for last week). When it happens the db gets lots of 
 connections in state active, eats 100% cpu and clients get errors (due to 
 timeout). 
 
 2014-02-12 15:44:24.562 
 MSK,rpop,rpopdb_p6,30061,localhost:58350,52fb5e53.756d,1,SELECT 
 waiting,2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,0,process 
 30061 still waiting for ExclusiveLock on extension of relation 26118 of 
 database 24590 after 1000.082 ms,SQL statement insert into 
 rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)
 
 I have read several topics [1, 2, 3, 4] with similar problems but haven't 
 find a good solution. Below is some more diagnostics.
 
 I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host 
 is running with the following CPU (32 cores) and memory:
 
 root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
 model name: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
 root@rpopdb01e ~ # free -m
  total   used   free sharedbuffers cached
 Mem:129028 123558   5469  0135 119504
 -/+ buffers/cache:   3918 125110
 Swap:16378  0  16378
 root@rpopdb01e ~ #
 
 PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say the 
 disks are really free. Right now PGDATA takes only 95G.
 The settings changed in postgresql.conf are here [5].
 
 When it happens the last query from here [6] shows that almost all queries 
 are waiting for ExclusiveLock, but they do a simple insert.
 
  (extend,26647,26825,,,) |5459 | ExclusiveLock | 1 | 
 (extend,26647,26825,,,) | 8053 | ExclusiveLock | 5459,8053
  (extend,26647,26828,,,) |5567 | ExclusiveLock | 1 | 
 (extend,26647,26828,,,) | 5490 | ExclusiveLock | 5567,5490
  (extend,24584,25626,,,) |5611 | ExclusiveLock | 1 | 
 (extend,24584,25626,,,) | 3963 | ExclusiveLock | 5611,3963
 
 I have several databases running on one host with one postmaster process and 
 ExclusiveLock is being waited by many oids. I suppose the only common thing 
 for all of them is that they are bigger than others and they almost do not 
 get updates and deletes (only inserts and reads). Some more info about one of 
 such tables is here [7].
 
 I have tried to look at the source code (src/backend/access/heap/hio.c) to 
 understand when the exclusive lock can be taken, but I could only read 
 comments :) I have also examined FSM for this tables and their indexes and 
 found that for most of them there are free pages but there are, for example, 
 such cases:
 
 rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail 
 != 0;
  count
 
  115953
 (1 row)
 
 rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where 
 avail != 0;
  count
 ---
  0
 (1 row)
 
 rpopdb_p0=# \dS+ rpop.rpop_uidl
Table rpop.rpop_uidl
  Column |  Type  | Modifiers | Storage  | Stats target | 
 Description
 ++---+--+--+-
  popid  | bigint | not null  | plain|  |
  uidl   | character varying(200) | not null  | extended |  |
 Indexes:
 pk_rpop_uidl PRIMARY KEY, btree (popid, uidl)
 Has OIDs: no
 
 rpopdb_p0=#
 
 
 My questions are:
 1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does it 
 happen during exclusive lock acquiring? How can I dig it?
 2. How much space do we extend to the relation when we get exclusive lock on 
 it?
 3. Why extended page is not visible for other backends?
 4. Is there any possibility of situation where backend A got exclusive lock 
 on some relation to extend it. Then OS CPU scheduler made a context switch to 
 backend B while backend B is waiting for exclusive lock on the same relation. 
 And so on for many backends.
 5. (and the main question) what can I do to get rid of such situations? It is 
 a production cluster and I do not have any ideas what to do with this 
 situation :( Any help would be really appropriate.
 
 [1] 
 http://www.postgresql.org/message-id/8bca3aa10906011613l8ac2423h8153bbd2513dc...@mail.gmail.com
 [2] 
 http://pgsql.performance.narkive.com/IrkPbl3f/postgresql-9-2-3-performance-problem-caused-exclusive-locks
 [3] http://www.postgresql.org/message-id/50a2c93e.9070...@dalibo.com
 [4] 
 http://www.postgresql.org/message-id/cal_0b1sypyeoynkynv95nnv2d+4jxtug3hkkf6fahfw7gvg...@mail.gmail.com
 [5] http://pastebin.com/raw.php?i=Bd40Vn6h
 [6] http://wiki.postgresql.org/wiki/Lock_dependency_information
 [7 http://pastebin.com/raw.php?i=eGrtG524]
 
 --
 Vladimir
 
 
 
 


Re: [PERFORM] Problem with ExclusiveLock on inserts

2014-02-12 Thread Ilya Kosmodemiansky
My question was actually about barrier option, by default it is enabled on 
RHEL6/ext4 and could cause serious bottleneck on io before disks are actually 
involved. What says mount without arguments? 

 On Feb 12, 2014, at 18:43, Бородин Владимир r...@simply.name wrote:
 
 root@rpopdb01e ~ # fgrep data /etc/fstab
 UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 
 noatime,nodiratime 0 1
 root@rpopdb01e ~ #
 
 According to iostat the disks are not the bottleneck.
 
 12.02.2014, в 21:30, Ilya Kosmodemiansky hydrobi...@gmail.com написал(а):
 
 Hi Vladimir,
 
 Just in case: how is your ext4 mount? 
 
 Best regards, 
 Ilya
 
 On Feb 12, 2014, at 17:59, Бородин Владимир r...@simply.name wrote:
 
 Hi all.
 
 Today I have started getting errors like below in logs (seems that I have 
 not changed anything for last week). When it happens the db gets lots of 
 connections in state active, eats 100% cpu and clients get errors (due to 
 timeout). 
 
 2014-02-12 15:44:24.562 
 MSK,rpop,rpopdb_p6,30061,localhost:58350,52fb5e53.756d,1,SELECT 
 waiting,2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,0,process 
 30061 still waiting for ExclusiveLock on extension of relation 26118 of 
 database 24590 after 1000.082 ms,SQL statement insert into 
 rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)
 
 I have read several topics [1, 2, 3, 4] with similar problems but haven't 
 find a good solution. Below is some more diagnostics.
 
 I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host 
 is running with the following CPU (32 cores) and memory:
 
 root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
 model name  : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
 root@rpopdb01e ~ # free -m
  total   used   free sharedbuffers cached
 Mem:129028 123558   5469  0135 119504
 -/+ buffers/cache:   3918 125110
 Swap:16378  0  16378
 root@rpopdb01e ~ #
 
 PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say 
 the disks are really free. Right now PGDATA takes only 95G.
 The settings changed in postgresql.conf are here [5].
 
 When it happens the last query from here [6] shows that almost all queries 
 are waiting for ExclusiveLock, but they do a simple insert.
 
  (extend,26647,26825,,,) |5459 | ExclusiveLock | 1 | 
 (extend,26647,26825,,,) | 8053 | ExclusiveLock | 5459,8053
  (extend,26647,26828,,,) |5567 | ExclusiveLock | 1 | 
 (extend,26647,26828,,,) | 5490 | ExclusiveLock | 5567,5490
  (extend,24584,25626,,,) |5611 | ExclusiveLock | 1 | 
 (extend,24584,25626,,,) | 3963 | ExclusiveLock | 5611,3963
 
 I have several databases running on one host with one postmaster process 
 and ExclusiveLock is being waited by many oids. I suppose the only common 
 thing for all of them is that they are bigger than others and they almost 
 do not get updates and deletes (only inserts and reads). Some more info 
 about one of such tables is here [7].
 
 I have tried to look at the source code (src/backend/access/heap/hio.c) to 
 understand when the exclusive lock can be taken, but I could only read 
 comments :) I have also examined FSM for this tables and their indexes and 
 found that for most of them there are free pages but there are, for 
 example, such cases:
 
 rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where avail 
 != 0;
  count
 
  115953
 (1 row)
 
 rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where 
 avail != 0;
  count
 ---
  0
 (1 row)
 
 rpopdb_p0=# \dS+ rpop.rpop_uidl
Table rpop.rpop_uidl
  Column |  Type  | Modifiers | Storage  | Stats target | 
 Description
 ++---+--+--+-
  popid  | bigint | not null  | plain|  |
  uidl   | character varying(200) | not null  | extended |  |
 Indexes:
 pk_rpop_uidl PRIMARY KEY, btree (popid, uidl)
 Has OIDs: no
 
 rpopdb_p0=#
 
 
 My questions are:
 1. Do we consume 100% cpu (in system) trying to get page from FSM? Or does 
 it happen during exclusive lock acquiring? How can I dig it?
 2. How much space do we extend to the relation when we get exclusive lock 
 on it?
 3. Why extended page is not visible for other backends?
 4. Is there any possibility of situation where backend A got exclusive lock 
 on some relation to extend it. Then OS CPU scheduler made a context switch 
 to backend B while backend B is waiting for exclusive lock on the same 
 relation. And so on for many backends.
 5. (and the main question) what can I do to get rid of such situations? It 
 is a production cluster and I do not have any ideas what to do with this 
 situation :( Any help would be really appropriate.
 
 [1] 
 http://www.postgresql.org/message-id

Re: [PERFORM] Problem with ExclusiveLock on inserts

2014-02-12 Thread Ilya Kosmodemiansky
another thing which is arguable - concurrency degree. How many of your
max_connections = 4000 are actually running?  4000 definitely looks like an
overkill and they could be a serious source of concurrency, especially then
you have had barrier enabled and software raid.

Plus for 32Gb of shared buffers with synchronous_commit = on especially on
heavy workload one should definitely have bbu, otherwise performance will
be poor.


On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир r...@simply.name wrote:

 Oh, I haven't thought about barriers, sorry. Although I use soft raid
 without batteries I have turned barriers off on one cluster shard to try.

 root@rpopdb01e ~ # mount | fgrep data
 /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
 root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
 root@rpopdb01e ~ # mount | fgrep data
 /dev/md2 on /var/lib/pgsql/9.3/data type ext4
 (rw,noatime,nodiratime,nobarrier)
 root@rpopdb01e ~ #

 12.02.2014, в 21:56, Ilya Kosmodemiansky hydrobi...@gmail.com
 написал(а):

 My question was actually about barrier option, by default it is enabled on
 RHEL6/ext4 and could cause serious bottleneck on io before disks are
 actually involved. What says mount without arguments?

 On Feb 12, 2014, at 18:43, Бородин Владимир r...@simply.name wrote:

 root@rpopdb01e ~ # fgrep data /etc/fstab
 UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4
 noatime,nodiratime 0 1
 root@rpopdb01e ~ #

 According to iostat the disks are not the bottleneck.

 12.02.2014, в 21:30, Ilya Kosmodemiansky hydrobi...@gmail.com
 написал(а):

 Hi Vladimir,

 Just in case: how is your ext4 mount?

 Best regards,
 Ilya

 On Feb 12, 2014, at 17:59, Бородин Владимир r...@simply.name wrote:

 Hi all.

 Today I have started getting errors like below in logs (seems that I have
 not changed anything for last week). When it happens the db gets lots of
 connections in state active, eats 100% cpu and clients get errors (due to
 timeout).

 2014-02-12 15:44:24.562
 MSK,rpop,rpopdb_p6,30061,localhost:58350,52fb5e53.756d,1,SELECT
 waiting,2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,0,process
 30061 still waiting for ExclusiveLock on extension of relation 26118 of
 database 24590 after 1000.082 ms,SQL statement insert into
 rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)

 I have read several topics [1, 2, 3, 4] with similar problems but haven't
 find a good solution. Below is some more diagnostics.

 I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4.
 Host is running with the following CPU (32 cores) and memory:

 root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
 model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
 root@rpopdb01e ~ # free -m
  total   used   free sharedbuffers cached
 Mem:129028 123558   5469  0135 119504
 -/+ buffers/cache:   3918 125110
 Swap:16378  0  16378
 root@rpopdb01e ~ #

 PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say
 the disks are really free. Right now PGDATA takes only 95G.
 The settings changed in postgresql.conf are here [5].

 When it happens the last query from here [6] shows that almost all queries
 are waiting for ExclusiveLock, but they do a simple insert.

  (extend,26647,26825,,,) |5459 | ExclusiveLock | 1 |
 (extend,26647,26825,,,) | 8053 | ExclusiveLock | 5459,8053
  (extend,26647,26828,,,) |5567 | ExclusiveLock | 1 |
 (extend,26647,26828,,,) | 5490 | ExclusiveLock | 5567,5490
  (extend,24584,25626,,,) |5611 | ExclusiveLock | 1 |
 (extend,24584,25626,,,) | 3963 | ExclusiveLock | 5611,3963

 I have several databases running on one host with one postmaster process
 and ExclusiveLock is being waited by many oids. I suppose the only common
 thing for all of them is that they are bigger than others and they almost
 do not get updates and deletes (only inserts and reads). Some more info
 about one of such tables is here [7].

 I have tried to look at the source code (src/backend/access/heap/hio.c) to
 understand when the exclusive lock can be taken, but I could only read
 comments :) I have also examined FSM for this tables and their indexes and
 found that for most of them there are free pages but there are, for
 example, such cases:

 rpopdb_p0=# select count(*) from pg_freespace('rpop.rpop_uidl') where
 avail != 0;
  count
 
  115953
 (1 row)

 rpopdb_p0=# select count(*) from pg_freespace('rpop.pk_rpop_uidl') where
 avail != 0;
  count
 ---
  0
 (1 row)

 rpopdb_p0=# \dS+ rpop.rpop_uidl
Table rpop.rpop_uidl
  Column |  Type  | Modifiers | Storage  | Stats target |
 Description

 ++---+--+--+-
  popid  | bigint | not null  | plain|  |
  uidl

Re: [PERFORM] Problem with ExclusiveLock on inserts

2014-02-12 Thread Ilya Kosmodemiansky
On Wed, Feb 12, 2014 at 8:57 PM, Бородин Владимир r...@simply.name wrote:

 Yes, this is legacy, I will fix it. We had lots of inactive connections but 
 right now we use pgbouncer for this. When the workload is normal we have some 
 kind of 80-120 backends. Less than 10 of them are in active state. Having 
 problem with locks we get lots of sessions (sometimes more than 1000 of them 
 are in active state). According to vmstat the number of context switches is 
 not so big (less than 20k), so I don't think it is the main reason. Yes, it 
 can aggravate the problem, but imho not create it.


I'am afraid that is the problem. More than 1000 backends, most of them
are simply waiting.



 I don't understand the correlation of shared buffers size and 
 synchronous_commit. Could you please explain your statement?


You need to fsync your huge shared buffers any time your database
performs checkpoint. By default it usually happens too often because
checkpoint_timeout is 5min by default. Without bbu, on software raid
that leads to io spike and you commit waits for wal.



 12.02.2014, в 23:37, Ilya Kosmodemiansky hydrobi...@gmail.com написал(а):

 another thing which is arguable - concurrency degree. How many of your 
 max_connections = 4000 are actually running?  4000 definitely looks like an 
 overkill and they could be a serious source of concurrency, especially then 
 you have had barrier enabled and software raid.

 Plus for 32Gb of shared buffers with synchronous_commit = on especially on 
 heavy workload one should definitely have bbu, otherwise performance will be 
 poor.


 On Wed, Feb 12, 2014 at 8:20 PM, Бородин Владимир r...@simply.name wrote:

 Oh, I haven't thought about barriers, sorry. Although I use soft raid 
 without batteries I have turned barriers off on one cluster shard to try.

 root@rpopdb01e ~ # mount | fgrep data
 /dev/md2 on /var/lib/pgsql/9.3/data type ext4 (rw,noatime,nodiratime)
 root@rpopdb01e ~ # mount -o remount,nobarrier /dev/md2
 root@rpopdb01e ~ # mount | fgrep data
 /dev/md2 on /var/lib/pgsql/9.3/data type ext4 
 (rw,noatime,nodiratime,nobarrier)
 root@rpopdb01e ~ #

 12.02.2014, в 21:56, Ilya Kosmodemiansky hydrobi...@gmail.com написал(а):

 My question was actually about barrier option, by default it is enabled on 
 RHEL6/ext4 and could cause serious bottleneck on io before disks are 
 actually involved. What says mount without arguments?

 On Feb 12, 2014, at 18:43, Бородин Владимир r...@simply.name wrote:

 root@rpopdb01e ~ # fgrep data /etc/fstab
 UUID=f815fd3f-e4e4-43a6-a6a1-bce1203db3e0 /var/lib/pgsql/9.3/data ext4 
 noatime,nodiratime 0 1
 root@rpopdb01e ~ #

 According to iostat the disks are not the bottleneck.

 12.02.2014, в 21:30, Ilya Kosmodemiansky hydrobi...@gmail.com написал(а):

 Hi Vladimir,

 Just in case: how is your ext4 mount?

 Best regards,
 Ilya

 On Feb 12, 2014, at 17:59, Бородин Владимир r...@simply.name wrote:

 Hi all.

 Today I have started getting errors like below in logs (seems that I have 
 not changed anything for last week). When it happens the db gets lots of 
 connections in state active, eats 100% cpu and clients get errors (due to 
 timeout).

 2014-02-12 15:44:24.562 
 MSK,rpop,rpopdb_p6,30061,localhost:58350,52fb5e53.756d,1,SELECT 
 waiting,2014-02-12 15:43:15 MSK,143/264877,1002850566,LOG,0,process 
 30061 still waiting for ExclusiveLock on extension of relation 26118 of 
 database 24590 after 1000.082 ms,SQL statement insert into 
 rpop.rpop_imap_uidls (folder_id, uidl) values (i_folder_id, i_uidl)

 I have read several topics [1, 2, 3, 4] with similar problems but haven't 
 find a good solution. Below is some more diagnostics.

 I am running PostgreSQL 9.3.2 installed from RPM packages on RHEL 6.4. Host 
 is running with the following CPU (32 cores) and memory:

 root@rpopdb01e ~ # fgrep -m1 'model name' /proc/cpuinfo
 model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
 root@rpopdb01e ~ # free -m
  total   used   free sharedbuffers cached
 Mem:129028 123558   5469  0135 119504
 -/+ buffers/cache:   3918 125110
 Swap:16378  0  16378
 root@rpopdb01e ~ #

 PGDATA lives on RAID6 array of 8 ssd-disks with ext4, iostat and atop say 
 the disks are really free. Right now PGDATA takes only 95G.
 The settings changed in postgresql.conf are here [5].

 When it happens the last query from here [6] shows that almost all queries 
 are waiting for ExclusiveLock, but they do a simple insert.

  (extend,26647,26825,,,) |5459 | ExclusiveLock | 1 | 
 (extend,26647,26825,,,) | 8053 | ExclusiveLock | 5459,8053
  (extend,26647,26828,,,) |5567 | ExclusiveLock | 1 | 
 (extend,26647,26828,,,) | 5490 | ExclusiveLock | 5567,5490
  (extend,24584,25626,,,) |5611 | ExclusiveLock | 1 | 
 (extend,24584,25626,,,) | 3963 | ExclusiveLock | 5611,3963

 I have several databases running on one host