Re: [GENERAL] Momentary Delay

2015-06-17 Thread Bill Moran
On Wed, 17 Jun 2015 10:33:37 -0300
Anderson Valadares anderva...@gmail.com wrote:

 2015-06-15 18:19 GMT-03:00 Bill Moran wmo...@potentialtech.com:
 
  On Mon, 15 Jun 2015 17:48:54 -0300
  Anderson Valadares anderva...@gmail.com wrote:
 
   2015-06-12 19:56 GMT-03:00 Bill Moran wmo...@potentialtech.com:
  
Please do not remove the mailing list from replies. See below.
   
On Fri, 12 Jun 2015 09:21:19 -0300
Anderson Valadares anderva...@gmail.com wrote:
   
 2015-06-08 20:33 GMT-03:00 Bill Moran wmo...@potentialtech.com:

  On Mon, 8 Jun 2015 11:59:31 -0300
  Anderson Valadares anderva...@gmail.com wrote:
 
   Hi
We are experiencing an intermittent problem in a GIS database
  from a
   client. Some times of the day is as if the PostgreSQL executed
  the
  slowest
   operations see below an example. The query was performed three
  times,
  twice
   I canceled and the third time I left it to end. The consultation
took 10
   seconds to finish, but the measured execution time is 20 ms. As
  we
have
   several queries that run every 2 seconds when this momentary
  delay
occurs
   queries accumulate and burst the limit of 203 connections
  allowed.
The
   interval of momentary stop are approximately 2 seconds
  occurring at
   random times and during these stoppages occur no locks and no
increased
   load on the server is identified. There is a job 2/2 seconds
  locks
   collection of information, running queries, etc., nmon also
  collects
   information every 2 seconds. My client asks what causes these
momentary
   stops? because it affects all operations of the database? etc.
  How
do I
   identify what is causing these delays in executions of
  operations in
the
   database?
 
  Number of points to consider:
  * Check the contents of pg_locks and see if something is waiting
  on a
lock when the query is slow.

  There is nothing waiting when the query is slow, see:

  snap_datetime | waiting | count
 ---+-+---
  2015-06-05 09:25:00.954731-03 | f   |74
  2015-06-05 09:26:00.249187-03 | f   |   205
  2015-06-05 09:27:00.826874-03 | f   |   207
   
I don't know what that means. Since you don't show the query that
generated that output, I have no idea if your statement is valid, or
if you're running a query that will inherently produce incorrect
results.
  
   Sorry the information was incomplete. Below is the query that was
   performed to extract information from tables that used to monitor the
   database.
   The snap_datetime column indicates the time that occurred monitoring,
   the waiting column tells you whether any connection was on hold and the
   column
   count tells how many connections existed at the time.
  
   select
s.snap_datetime
,a.waiting
, count(*)
   from stat_snapshot s, stat_activity a
   where s.snap_id = a.snap_id
 and s.snap_datetime = '2015-06-05 09:25:00'
 and s.snap_datetime = '2015-06-05 09:28:00'
   group by
 s.snap_datetime
 ,a.waiting
   order by s.snap_datetime
   ;
snap_datetime | waiting | count
   ---+-+---
2015-06-05 09:25:00.954731-03 | f   |74
2015-06-05 09:26:00.249187-03 | f   |   205
2015-06-05 09:27:00.826874-03 | f   |   207
   (3 rows)
 
  Given what I'm seeing above, there's still a lot of information
  missing. Those tables are not standard PostgreSQL tables, and I'm
  not aware of any tool that creates them, thus I have to assume
  it's a cron job or something similar created in-house. Given the
  query you ran and the resultant data, my first guess is that the
  data in stat_snapshot and stat_activity is worthless: i.e. it's
  captured once per minute, and therefore there are 59+ seconds
  worth of detail that aren't captured, thus the actual liklihood
  that the those tables will contain any indication of the problem
  is very low.
 
 Yes it is a cron, created at home, running every 1 hour gathering
 information from PostgreSQL tables (pg_stat_activity, pg_locks, etc).
 How to improve the collection of information PostgreSQL?

Look at the log file, details of what is going on are being recorded
as they happen.

However, given your statement below (claiming that nothing happened
during the observed slowdown between 9:26 and 9:27) you may want to
find someone who is actually capable of interpreting the log output,
as there are dozens of indicators of what's wrong.  Immediately after
the problem disappears, a number of queries finish after spending a
lot of time executing, in addition to a number of complaints of long
lock waits. So all those queries were fighting each other and when
they finally untangled themselves and got their work done, 

Re: [GENERAL] Momentary Delay

2015-06-17 Thread Anderson Valadares
2015-06-15 18:19 GMT-03:00 Bill Moran wmo...@potentialtech.com:

 On Mon, 15 Jun 2015 17:48:54 -0300
 Anderson Valadares anderva...@gmail.com wrote:

  2015-06-12 19:56 GMT-03:00 Bill Moran wmo...@potentialtech.com:
 
   Please do not remove the mailing list from replies. See below.
  
   On Fri, 12 Jun 2015 09:21:19 -0300
   Anderson Valadares anderva...@gmail.com wrote:
  
2015-06-08 20:33 GMT-03:00 Bill Moran wmo...@potentialtech.com:
   
 On Mon, 8 Jun 2015 11:59:31 -0300
 Anderson Valadares anderva...@gmail.com wrote:

  Hi
   We are experiencing an intermittent problem in a GIS database
 from a
  client. Some times of the day is as if the PostgreSQL executed
 the
 slowest
  operations see below an example. The query was performed three
 times,
 twice
  I canceled and the third time I left it to end. The consultation
   took 10
  seconds to finish, but the measured execution time is 20 ms. As
 we
   have
  several queries that run every 2 seconds when this momentary
 delay
   occurs
  queries accumulate and burst the limit of 203 connections
 allowed.
   The
  interval of momentary stop are approximately 2 seconds
 occurring at
  random times and during these stoppages occur no locks and no
   increased
  load on the server is identified. There is a job 2/2 seconds
 locks
  collection of information, running queries, etc., nmon also
 collects
  information every 2 seconds. My client asks what causes these
   momentary
  stops? because it affects all operations of the database? etc.
 How
   do I
  identify what is causing these delays in executions of
 operations in
   the
  database?

 Number of points to consider:
 * Check the contents of pg_locks and see if something is waiting
 on a
   lock when the query is slow.
   
 There is nothing waiting when the query is slow, see:
   
 snap_datetime | waiting | count
---+-+---
 2015-06-05 09:25:00.954731-03 | f   |74
 2015-06-05 09:26:00.249187-03 | f   |   205
 2015-06-05 09:27:00.826874-03 | f   |   207
  
   I don't know what that means. Since you don't show the query that
   generated that output, I have no idea if your statement is valid, or
   if you're running a query that will inherently produce incorrect
   results.
 
  Sorry the information was incomplete. Below is the query that was
  performed to extract information from tables that used to monitor the
  database.
  The snap_datetime column indicates the time that occurred monitoring,
  the waiting column tells you whether any connection was on hold and the
  column
  count tells how many connections existed at the time.
 
  select
   s.snap_datetime
   ,a.waiting
   , count(*)
  from stat_snapshot s, stat_activity a
  where s.snap_id = a.snap_id
and s.snap_datetime = '2015-06-05 09:25:00'
and s.snap_datetime = '2015-06-05 09:28:00'
  group by
s.snap_datetime
,a.waiting
  order by s.snap_datetime
  ;
   snap_datetime | waiting | count
  ---+-+---
   2015-06-05 09:25:00.954731-03 | f   |74
   2015-06-05 09:26:00.249187-03 | f   |   205
   2015-06-05 09:27:00.826874-03 | f   |   207
  (3 rows)

 Given what I'm seeing above, there's still a lot of information
 missing. Those tables are not standard PostgreSQL tables, and I'm
 not aware of any tool that creates them, thus I have to assume
 it's a cron job or something similar created in-house. Given the
 query you ran and the resultant data, my first guess is that the
 data in stat_snapshot and stat_activity is worthless: i.e. it's
 captured once per minute, and therefore there are 59+ seconds
 worth of detail that aren't captured, thus the actual liklihood
 that the those tables will contain any indication of the problem
 is very low.


Yes it is a cron, created at home, running every 1 hour gathering
information from PostgreSQL tables (pg_stat_activity, pg_locks, etc).
How to improve the collection of information PostgreSQL?




* Also, ensure log_lock_waits is turned on for the server and check
   PostgreSQL's logs for messages about long lock waits.
   
There is also nothing
  
   This is even worse than the previous statement. What did you _do_ to
   determine this?
  
   Since you're unable to diagnose and fix the problem on your own, the
   possibility exists that the problem is with your diagnostic steps. If
   you don't share the details of those steps, I have no way to help you
   with them.
 
  See the log in
 
 https://github.com/andervalbh/serv-gis01/blob/master/postgresql-2015-06-05_00.log

 Apologies, my above comment was intended to be in regards to the IO load.

 However, that log shows that you have massive lock contention. In the early
 part of the log a process waits over 5 minutes for a share lock:
 

Re: [GENERAL] Momentary Delay

2015-06-15 Thread Anderson Valadares
2015-06-12 19:56 GMT-03:00 Bill Moran wmo...@potentialtech.com:


 Please do not remove the mailing list from replies. See below.

 On Fri, 12 Jun 2015 09:21:19 -0300
 Anderson Valadares anderva...@gmail.com wrote:

  2015-06-08 20:33 GMT-03:00 Bill Moran wmo...@potentialtech.com:
 
   On Mon, 8 Jun 2015 11:59:31 -0300
   Anderson Valadares anderva...@gmail.com wrote:
  
Hi
 We are experiencing an intermittent problem in a GIS database from a
client. Some times of the day is as if the PostgreSQL executed the
   slowest
operations see below an example. The query was performed three times,
   twice
I canceled and the third time I left it to end. The consultation
 took 10
seconds to finish, but the measured execution time is 20 ms. As we
 have
several queries that run every 2 seconds when this momentary delay
 occurs
queries accumulate and burst the limit of 203 connections allowed.
 The
interval of momentary stop are approximately 2 seconds occurring at
random times and during these stoppages occur no locks and no
 increased
load on the server is identified. There is a job 2/2 seconds locks
collection of information, running queries, etc., nmon also collects
information every 2 seconds. My client asks what causes these
 momentary
stops? because it affects all operations of the database? etc. How
 do I
identify what is causing these delays in executions of operations in
 the
database?
  
   Number of points to consider:
   * Check the contents of pg_locks and see if something is waiting on a
 lock when the query is slow.
 
   There is nothing waiting when the query is slow, see:
 
   snap_datetime | waiting | count
  ---+-+---
   2015-06-05 09:25:00.954731-03 | f   |74
   2015-06-05 09:26:00.249187-03 | f   |   205
   2015-06-05 09:27:00.826874-03 | f   |   207

 I don't know what that means. Since you don't show the query that
 generated that output, I have no idea if your statement is valid, or
 if you're running a query that will inherently produce incorrect
 results.


Sorry the information was incomplete. Below is the query that was
performed to extract information from tables that used to monitor the
database.
The snap_datetime column indicates the time that occurred monitoring,
the waiting column tells you whether any connection was on hold and the
column
count tells how many connections existed at the time.

select
 s.snap_datetime
 ,a.waiting
 , count(*)
from stat_snapshot s, stat_activity a
where s.snap_id = a.snap_id
  and s.snap_datetime = '2015-06-05 09:25:00'
  and s.snap_datetime = '2015-06-05 09:28:00'
group by
  s.snap_datetime
  ,a.waiting
order by s.snap_datetime
;
 snap_datetime | waiting | count
---+-+---
 2015-06-05 09:25:00.954731-03 | f   |74
 2015-06-05 09:26:00.249187-03 | f   |   205
 2015-06-05 09:27:00.826874-03 | f   |   207
(3 rows)


  * Also, ensure log_lock_waits is turned on for the server and check
 PostgreSQL's logs for messages about long lock waits.
 
  There is also nothing

 This is even worse than the previous statement. What did you _do_ to
 determine this?

 Since you're unable to diagnose and fix the problem on your own, the
 possibility exists that the problem is with your diagnostic steps. If
 you don't share the details of those steps, I have no way to help you
 with them.


See the log in
https://github.com/andervalbh/serv-gis01/blob/master/postgresql-2015-06-05_00.log


   * Based on the connection behavior you describe, I'm guessing it's a
 Tomcat app using some sort of conection pool. Whatever it is, tune
 your connection pool settings so that the max size of the pool
 doesn't
 exceed the available PostgreSQL connections. At least that will
 prevent
 errors from happening when the problem occurs.
 
  Yes it is a tomcat application using connection pooling and will be set
 to
  not exceed the limit of PostgreSQL connections, but the central point is
  that the number of connections has increased due to what I am calling
  momentary stoppages

 I understand that. I made the suggestion as a temporary fix to reduce
 the impact until you are able to fix the actual cause.

   * The query you you keep cancelling below, run just EXPLAIN on it (not
 EXPLAIN ANALYZE) which will certainly finish and give you a plan
 that can be reviewed to help determine what the problem is.
 
  Explain plan in http://explain.depesz.com/s/bWw

 I'm confused. The query at that URL only took 10ms to complete, which
 doesn't seem at all unreasonable.


I agree with you and that's why I ask why the delay in the execution at
certain times
as demonstrated above.



  * On the OS, monitor iostat or something else to see if you're
 saturating disk capacity when the problem happens.
 
  The OS is also being monitored and 

Re: [GENERAL] Momentary Delay

2015-06-15 Thread Bill Moran
On Mon, 15 Jun 2015 17:48:54 -0300
Anderson Valadares anderva...@gmail.com wrote:

 2015-06-12 19:56 GMT-03:00 Bill Moran wmo...@potentialtech.com:
 
  Please do not remove the mailing list from replies. See below.
 
  On Fri, 12 Jun 2015 09:21:19 -0300
  Anderson Valadares anderva...@gmail.com wrote:
 
   2015-06-08 20:33 GMT-03:00 Bill Moran wmo...@potentialtech.com:
  
On Mon, 8 Jun 2015 11:59:31 -0300
Anderson Valadares anderva...@gmail.com wrote:
   
 Hi
  We are experiencing an intermittent problem in a GIS database from a
 client. Some times of the day is as if the PostgreSQL executed the
slowest
 operations see below an example. The query was performed three times,
twice
 I canceled and the third time I left it to end. The consultation
  took 10
 seconds to finish, but the measured execution time is 20 ms. As we
  have
 several queries that run every 2 seconds when this momentary delay
  occurs
 queries accumulate and burst the limit of 203 connections allowed.
  The
 interval of momentary stop are approximately 2 seconds occurring at
 random times and during these stoppages occur no locks and no
  increased
 load on the server is identified. There is a job 2/2 seconds locks
 collection of information, running queries, etc., nmon also collects
 information every 2 seconds. My client asks what causes these
  momentary
 stops? because it affects all operations of the database? etc. How
  do I
 identify what is causing these delays in executions of operations in
  the
 database?
   
Number of points to consider:
* Check the contents of pg_locks and see if something is waiting on a
  lock when the query is slow.
  
There is nothing waiting when the query is slow, see:
  
snap_datetime | waiting | count
   ---+-+---
2015-06-05 09:25:00.954731-03 | f   |74
2015-06-05 09:26:00.249187-03 | f   |   205
2015-06-05 09:27:00.826874-03 | f   |   207
 
  I don't know what that means. Since you don't show the query that
  generated that output, I have no idea if your statement is valid, or
  if you're running a query that will inherently produce incorrect
  results.
 
 Sorry the information was incomplete. Below is the query that was
 performed to extract information from tables that used to monitor the
 database.
 The snap_datetime column indicates the time that occurred monitoring,
 the waiting column tells you whether any connection was on hold and the
 column
 count tells how many connections existed at the time.
 
 select
  s.snap_datetime
  ,a.waiting
  , count(*)
 from stat_snapshot s, stat_activity a
 where s.snap_id = a.snap_id
   and s.snap_datetime = '2015-06-05 09:25:00'
   and s.snap_datetime = '2015-06-05 09:28:00'
 group by
   s.snap_datetime
   ,a.waiting
 order by s.snap_datetime
 ;
  snap_datetime | waiting | count
 ---+-+---
  2015-06-05 09:25:00.954731-03 | f   |74
  2015-06-05 09:26:00.249187-03 | f   |   205
  2015-06-05 09:27:00.826874-03 | f   |   207
 (3 rows)

Given what I'm seeing above, there's still a lot of information
missing. Those tables are not standard PostgreSQL tables, and I'm
not aware of any tool that creates them, thus I have to assume
it's a cron job or something similar created in-house. Given the
query you ran and the resultant data, my first guess is that the
data in stat_snapshot and stat_activity is worthless: i.e. it's
captured once per minute, and therefore there are 59+ seconds
worth of detail that aren't captured, thus the actual liklihood
that the those tables will contain any indication of the problem
is very low.

   * Also, ensure log_lock_waits is turned on for the server and check
  PostgreSQL's logs for messages about long lock waits.
  
   There is also nothing
 
  This is even worse than the previous statement. What did you _do_ to
  determine this?
 
  Since you're unable to diagnose and fix the problem on your own, the
  possibility exists that the problem is with your diagnostic steps. If
  you don't share the details of those steps, I have no way to help you
  with them.
 
 See the log in
 https://github.com/andervalbh/serv-gis01/blob/master/postgresql-2015-06-05_00.log

Apologies, my above comment was intended to be in regards to the IO load.

However, that log shows that you have massive lock contention. In the early
part of the log a process waits over 5 minutes for a share lock:
2015-06-05 05:51:32 BRT [44236822] - [10.57.1.26] [14-1] db=citgis,user=opegis 
LOG:  process 44236822 acquired ShareLock on transaction 3090377758 after 
374146.817 ms

Without seeing the source of the PL/pgSQL functions that seem to be
causing the problem, I can only suggest that you review those functions
for excessive locking and inefficient access. I'm going to guess, with
the 

Re: [GENERAL] Momentary Delay

2015-06-12 Thread Bill Moran

Please do not remove the mailing list from replies. See below.

On Fri, 12 Jun 2015 09:21:19 -0300
Anderson Valadares anderva...@gmail.com wrote:

 2015-06-08 20:33 GMT-03:00 Bill Moran wmo...@potentialtech.com:
 
  On Mon, 8 Jun 2015 11:59:31 -0300
  Anderson Valadares anderva...@gmail.com wrote:
 
   Hi
We are experiencing an intermittent problem in a GIS database from a
   client. Some times of the day is as if the PostgreSQL executed the
  slowest
   operations see below an example. The query was performed three times,
  twice
   I canceled and the third time I left it to end. The consultation took 10
   seconds to finish, but the measured execution time is 20 ms. As we have
   several queries that run every 2 seconds when this momentary delay occurs
   queries accumulate and burst the limit of 203 connections allowed. The
   interval of momentary stop are approximately 2 seconds occurring at
   random times and during these stoppages occur no locks and no increased
   load on the server is identified. There is a job 2/2 seconds locks
   collection of information, running queries, etc., nmon also collects
   information every 2 seconds. My client asks what causes these momentary
   stops? because it affects all operations of the database? etc. How do I
   identify what is causing these delays in executions of operations in the
   database?
 
  Number of points to consider:
  * Check the contents of pg_locks and see if something is waiting on a
lock when the query is slow.
 
  There is nothing waiting when the query is slow, see:
 
  snap_datetime | waiting | count
 ---+-+---
  2015-06-05 09:25:00.954731-03 | f   |74
  2015-06-05 09:26:00.249187-03 | f   |   205
  2015-06-05 09:27:00.826874-03 | f   |   207

I don't know what that means. Since you don't show the query that
generated that output, I have no idea if your statement is valid, or
if you're running a query that will inherently produce incorrect
results.

 * Also, ensure log_lock_waits is turned on for the server and check
PostgreSQL's logs for messages about long lock waits.
 
 There is also nothing

This is even worse than the previous statement. What did you _do_ to
determine this?

Since you're unable to diagnose and fix the problem on your own, the
possibility exists that the problem is with your diagnostic steps. If
you don't share the details of those steps, I have no way to help you
with them.

  * Based on the connection behavior you describe, I'm guessing it's a
Tomcat app using some sort of conection pool. Whatever it is, tune
your connection pool settings so that the max size of the pool doesn't
exceed the available PostgreSQL connections. At least that will prevent
errors from happening when the problem occurs.
 
 Yes it is a tomcat application using connection pooling and will be set to
 not exceed the limit of PostgreSQL connections, but the central point is
 that the number of connections has increased due to what I am calling
 momentary stoppages

I understand that. I made the suggestion as a temporary fix to reduce
the impact until you are able to fix the actual cause.

  * The query you you keep cancelling below, run just EXPLAIN on it (not
EXPLAIN ANALYZE) which will certainly finish and give you a plan
that can be reviewed to help determine what the problem is.
 
 Explain plan in http://explain.depesz.com/s/bWw

I'm confused. The query at that URL only took 10ms to complete, which
doesn't seem at all unreasonable.

 * On the OS, monitor iostat or something else to see if you're
saturating disk capacity when the problem happens.
 
 The OS is also being monitored and there occurs no saturation.

Again, without any details, it's possible that there is a problem
in your monitoring.

 
  The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
   powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
  
   Evolution of the number of connections for a period
  
   snap_datetime|  #connections
   --+-
   2015-06-05 09:25:00.954731-03 | 74
   2015-06-05 09:26:00.249187-03 | 205
   2015-06-05 09:27:00.826874-03 | 207
   2015-06-05 09:28:00.374666-03 | 73
   2015-06-05 09:29:00.690696-03 | 75
  
  
   Occurrence of the problem
  
   citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
  dvi_cod_dados_via_iti,
   dvi_sub.via_cod_viagem
from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
   via_sub, gis_iti_itinerario iti_sub
where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
  and via_sub.via_status_viagem = 'A'
  and via_sub.via_dt_hora_ini  now() - interval '9 hours'
  and iti_sub.lin_cod_linha = 389
  and iti_sub.iti_sentido_itinerario = 'I'
  and iti_sub.iti_cod_itinerario_linha =
   

Re: [GENERAL] Momentary Delay

2015-06-08 Thread Anderson Valadares
2015-06-08 13:58 GMT-03:00 Adrian Klaver adrian.kla...@aklaver.com:

 On 06/08/2015 09:25 AM, Anderson Valadares wrote:

 2015-06-08 12:30 GMT-03:00 Adrian Klaver adrian.kla...@aklaver.com
 mailto:adrian.kla...@aklaver.com:





 Adrian
   forgiveness for my english the meant Exists a job run every 2 seconds
 capturing information of locks, long running queries, etc.


 So have you looked at the Postgres logs during these time periods to see
 if there is anything relevant?




 Anderson



 --
 Adrian Klaver
 adrian.kla...@aklaver.com


I saw the logs and there is nothing wrong.


Re: [GENERAL] Momentary Delay

2015-06-08 Thread Adrian Klaver

On 06/08/2015 09:25 AM, Anderson Valadares wrote:

2015-06-08 12:30 GMT-03:00 Adrian Klaver adrian.kla...@aklaver.com
mailto:adrian.kla...@aklaver.com:







Adrian
  forgiveness for my english the meant Exists a job run every 2 seconds
capturing information of locks, long running queries, etc.


So have you looked at the Postgres logs during these time periods to see 
if there is anything relevant?





Anderson



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Momentary Delay

2015-06-08 Thread Bill Moran
On Mon, 8 Jun 2015 11:59:31 -0300
Anderson Valadares anderva...@gmail.com wrote:

 Hi
  We are experiencing an intermittent problem in a GIS database from a
 client. Some times of the day is as if the PostgreSQL executed the slowest
 operations see below an example. The query was performed three times, twice
 I canceled and the third time I left it to end. The consultation took 10
 seconds to finish, but the measured execution time is 20 ms. As we have
 several queries that run every 2 seconds when this momentary delay occurs
 queries accumulate and burst the limit of 203 connections allowed. The
 interval of momentary stop are approximately 2 seconds occurring at
 random times and during these stoppages occur no locks and no increased
 load on the server is identified. There is a job 2/2 seconds locks
 collection of information, running queries, etc., nmon also collects
 information every 2 seconds. My client asks what causes these momentary
 stops? because it affects all operations of the database? etc. How do I
 identify what is causing these delays in executions of operations in the
 database?

Number of points to consider:
* Check the contents of pg_locks and see if something is waiting on a
  lock when the query is slow.
* Also, ensure log_lock_waits is turned on for the server and check
  PostgreSQL's logs for messages about long lock waits.
* Make sure shared_buffers isn't larger than 16G. I've seen issues with
  PG 9.X and shared_buffers settings larger than that causing
  intermittant stalls.
* Based on the connection behavior you describe, I'm guessing it's a
  Tomcat app using some sort of conection pool. Whatever it is, tune
  your connection pool settings so that the max size of the pool doesn't
  exceed the available PostgreSQL connections. At least that will prevent
  errors from happening when the problem occurs.
* The query you you keep cancelling below, run judy EXPLAIN on it (not
  EXPLAIN ANALYZE) which will certainly finish and give you a plan
  that can be reviewed to help determine what the problem is.
* On the OS, monitor iostat or something else to see if you're
  saturating disk capacity when the problem happens.

 The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
 powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit
 
 Evolution of the number of connections for a period
 
 snap_datetime|  #connections
 --+-
 2015-06-05 09:25:00.954731-03 | 74
 2015-06-05 09:26:00.249187-03 | 205
 2015-06-05 09:27:00.826874-03 | 207
 2015-06-05 09:28:00.374666-03 | 73
 2015-06-05 09:29:00.690696-03 | 75
 
 
 Occurrence of the problem
 
 citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
 via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and iti_sub.iti_cod_itinerario_linha =
 via_sub.iti_cod_itinerario_linha
and dvi_sub.vlo_cod_localizacao is not null
  group by dvi_sub.via_cod_viagem,
 iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
 ;
 Cancel request sent
 ERROR:  canceling statement due to user request
 Time: 2555.557 ms
 citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
 via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and iti_sub.iti_cod_itinerario_linha =
 via_sub.iti_cod_itinerario_linha
and dvi_sub.vlo_cod_localizacao is not null
  group by dvi_sub.via_cod_viagem,
 iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
 ;
 Cancel request sent
 ERROR:  canceling statement due to user request
 Time: 2192.572 ms
 citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
 dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
 via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and 

Re: [GENERAL] Momentary Delay

2015-06-08 Thread Adrian Klaver

On 06/08/2015 07:59 AM, Anderson Valadares wrote:

Hi
  We are experiencing an intermittent problem in a GIS database from a
client. Some times of the day is as if the PostgreSQL executed the
slowest operations see below an example. The query was performed three
times, twice I canceled and the third time I left it to end. The
consultation took 10 seconds to finish, but the measured execution time
is 20 ms. As we have several queries that run every 2 seconds when this
momentary delay occurs queries accumulate and burst the limit of 203
connections allowed. The interval of momentary stop are approximately
2 seconds occurring at random times and during these stoppages occur no
locks and no increased load on the server is identified. There is a job
2/2 seconds locks collection of information, running queries, etc., nmon
also collects information every 2 seconds.


I do not have an answer, but I do have a question. What does:

There is a job
2/2 seconds locks collection of information, running queries, etc

mean?


My client asks what causes

these momentary stops? because it affects all operations of the
database? etc. How do I identify what is causing these delays in
executions of operations in the database?

The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit

Evolution of the number of connections for a period

snap_datetime   |  #connections
--+-
2015-06-05 09:25:00.954731-03 |74
2015-06-05 09:26:00.249187-03 |205
2015-06-05 09:27:00.826874-03 |207
2015-06-05 09:28:00.374666-03 |73
2015-06-05 09:29:00.690696-03 |75


Occurrence of the problem

citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
and dvi_sub.vlo_cod_localizacao is not null
  group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 2555.557 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
and dvi_sub.vlo_cod_localizacao is not null
  group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 2192.572 ms
citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
and dvi_sub.vlo_cod_localizacao is not null
  group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 10164.015 ms
citgis=# select now();
   now
---
  2015-06-05 09:27:22.006072-03
(1 row)

Time: 0.152 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
 dvi_sub.via_cod_viagem
  from gis_dvi_dados_viagem_itinerario dvi_sub,
gis_via_viagem via_sub, gis_iti_itinerario iti_sub
  where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
and via_sub.via_status_viagem = 'A'
and via_sub.via_dt_hora_ini  now() - interval '9 hours'
and iti_sub.lin_cod_linha = 389
and iti_sub.iti_sentido_itinerario = 'I'
and 

[GENERAL] Momentary Delay

2015-06-08 Thread Anderson Valadares
Hi
 We are experiencing an intermittent problem in a GIS database from a
client. Some times of the day is as if the PostgreSQL executed the slowest
operations see below an example. The query was performed three times, twice
I canceled and the third time I left it to end. The consultation took 10
seconds to finish, but the measured execution time is 20 ms. As we have
several queries that run every 2 seconds when this momentary delay occurs
queries accumulate and burst the limit of 203 connections allowed. The
interval of momentary stop are approximately 2 seconds occurring at
random times and during these stoppages occur no locks and no increased
load on the server is identified. There is a job 2/2 seconds locks
collection of information, running queries, etc., nmon also collects
information every 2 seconds. My client asks what causes these momentary
stops? because it affects all operations of the database? etc. How do I
identify what is causing these delays in executions of operations in the
database?

The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit

Evolution of the number of connections for a period

snap_datetime|  #connections
--+-
2015-06-05 09:25:00.954731-03 | 74
2015-06-05 09:26:00.249187-03 | 205
2015-06-05 09:27:00.826874-03 | 207
2015-06-05 09:28:00.374666-03 | 73
2015-06-05 09:29:00.690696-03 | 75


Occurrence of the problem

citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
dvi_sub.via_cod_viagem
 from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
via_sub, gis_iti_itinerario iti_sub
 where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
   and via_sub.via_status_viagem = 'A'
   and via_sub.via_dt_hora_ini  now() - interval '9 hours'
   and iti_sub.lin_cod_linha = 389
   and iti_sub.iti_sentido_itinerario = 'I'
   and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
   and dvi_sub.vlo_cod_localizacao is not null
 group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 2555.557 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
dvi_sub.via_cod_viagem
 from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
via_sub, gis_iti_itinerario iti_sub
 where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
   and via_sub.via_status_viagem = 'A'
   and via_sub.via_dt_hora_ini  now() - interval '9 hours'
   and iti_sub.lin_cod_linha = 389
   and iti_sub.iti_sentido_itinerario = 'I'
   and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
   and dvi_sub.vlo_cod_localizacao is not null
 group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 2192.572 ms
citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
dvi_cod_dados_via_iti,
dvi_sub.via_cod_viagem
 from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
via_sub, gis_iti_itinerario iti_sub
 where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
   and via_sub.via_status_viagem = 'A'
   and via_sub.via_dt_hora_ini  now() - interval '9 hours'
   and iti_sub.lin_cod_linha = 389
   and iti_sub.iti_sentido_itinerario = 'I'
   and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
   and dvi_sub.vlo_cod_localizacao is not null
 group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
Cancel request sent
ERROR:  canceling statement due to user request
Time: 10164.015 ms
citgis=# select now();
  now
---
 2015-06-05 09:27:22.006072-03
(1 row)

Time: 0.152 ms
citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as dvi_cod_dados_via_iti,
dvi_sub.via_cod_viagem
 from gis_dvi_dados_viagem_itinerario dvi_sub, gis_via_viagem
via_sub, gis_iti_itinerario iti_sub
 where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
   and via_sub.via_status_viagem = 'A'
   and via_sub.via_dt_hora_ini  now() - interval '9 hours'
   and iti_sub.lin_cod_linha = 389
   and iti_sub.iti_sentido_itinerario = 'I'
   and iti_sub.iti_cod_itinerario_linha =
via_sub.iti_cod_itinerario_linha
   and dvi_sub.vlo_cod_localizacao is not null
 group by dvi_sub.via_cod_viagem,
iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
;
 dvi_cod_dados_via_iti | 

Re: [GENERAL] Momentary Delay

2015-06-08 Thread Anderson Valadares
2015-06-08 12:30 GMT-03:00 Adrian Klaver adrian.kla...@aklaver.com:

 On 06/08/2015 07:59 AM, Anderson Valadares wrote:

 Hi
   We are experiencing an intermittent problem in a GIS database from a
 client. Some times of the day is as if the PostgreSQL executed the
 slowest operations see below an example. The query was performed three
 times, twice I canceled and the third time I left it to end. The
 consultation took 10 seconds to finish, but the measured execution time
 is 20 ms. As we have several queries that run every 2 seconds when this
 momentary delay occurs queries accumulate and burst the limit of 203
 connections allowed. The interval of momentary stop are approximately
 2 seconds occurring at random times and during these stoppages occur no
 locks and no increased load on the server is identified. There is a job
 2/2 seconds locks collection of information, running queries, etc., nmon
 also collects information every 2 seconds.


 I do not have an answer, but I do have a question. What does:

 There is a job
 2/2 seconds locks collection of information, running queries, etc

 mean?



 My client asks what causes

 these momentary stops? because it affects all operations of the
 database? etc. How do I identify what is causing these delays in
 executions of operations in the database?

 The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on
 powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit

 Evolution of the number of connections for a period

 snap_datetime   |  #connections
 --+-
 2015-06-05 09:25:00.954731-03 |74
 2015-06-05 09:26:00.249187-03 |205
 2015-06-05 09:27:00.826874-03 |207
 2015-06-05 09:28:00.374666-03 |73
 2015-06-05 09:29:00.690696-03 |75


 Occurrence of the problem

 citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
 dvi_cod_dados_via_iti,
  dvi_sub.via_cod_viagem
   from gis_dvi_dados_viagem_itinerario dvi_sub,
 gis_via_viagem via_sub, gis_iti_itinerario iti_sub
   where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
 and via_sub.via_status_viagem = 'A'
 and via_sub.via_dt_hora_ini  now() - interval '9 hours'
 and iti_sub.lin_cod_linha = 389
 and iti_sub.iti_sentido_itinerario = 'I'
 and iti_sub.iti_cod_itinerario_linha =
 via_sub.iti_cod_itinerario_linha
 and dvi_sub.vlo_cod_localizacao is not null
   group by dvi_sub.via_cod_viagem,
 iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
 ;
 Cancel request sent
 ERROR:  canceling statement due to user request
 Time: 2555.557 ms
 citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
 dvi_cod_dados_via_iti,
  dvi_sub.via_cod_viagem
   from gis_dvi_dados_viagem_itinerario dvi_sub,
 gis_via_viagem via_sub, gis_iti_itinerario iti_sub
   where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
 and via_sub.via_status_viagem = 'A'
 and via_sub.via_dt_hora_ini  now() - interval '9 hours'
 and iti_sub.lin_cod_linha = 389
 and iti_sub.iti_sentido_itinerario = 'I'
 and iti_sub.iti_cod_itinerario_linha =
 via_sub.iti_cod_itinerario_linha
 and dvi_sub.vlo_cod_localizacao is not null
   group by dvi_sub.via_cod_viagem,
 iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
 ;
 Cancel request sent
 ERROR:  canceling statement due to user request
 Time: 2192.572 ms
 citgis=# explain analyze select max(dvi_sub.dvi_cod_dados_via_iti) as
 dvi_cod_dados_via_iti,
  dvi_sub.via_cod_viagem
   from gis_dvi_dados_viagem_itinerario dvi_sub,
 gis_via_viagem via_sub, gis_iti_itinerario iti_sub
   where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
 and via_sub.via_status_viagem = 'A'
 and via_sub.via_dt_hora_ini  now() - interval '9 hours'
 and iti_sub.lin_cod_linha = 389
 and iti_sub.iti_sentido_itinerario = 'I'
 and iti_sub.iti_cod_itinerario_linha =
 via_sub.iti_cod_itinerario_linha
 and dvi_sub.vlo_cod_localizacao is not null
   group by dvi_sub.via_cod_viagem,
 iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto
 ;
 Cancel request sent
 ERROR:  canceling statement due to user request
 Time: 10164.015 ms
 citgis=# select now();
now
 ---
   2015-06-05 09:27:22.006072-03
 (1 row)

 Time: 0.152 ms
 citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as
 dvi_cod_dados_via_iti,
  dvi_sub.via_cod_viagem
   from gis_dvi_dados_viagem_itinerario dvi_sub,
 gis_via_viagem via_sub, gis_iti_itinerario iti_sub
   where dvi_sub.via_cod_viagem = via_sub.via_cod_viagem
 and via_sub.via_status_viagem = 'A'
 and via_sub.via_dt_hora_ini