2015-06-15 18:19 GMT-03:00 Bill Moran <wmo...@potentialtech.com>: > On Mon, 15 Jun 2015 17:48:54 -0300 > Anderson Valadares <anderva...@gmail.com> wrote: > > > 2015-06-12 19:56 GMT-03:00 Bill Moran <wmo...@potentialtech.com>: > > > > > Please do not remove the mailing list from replies. See below. > > > > > > On Fri, 12 Jun 2015 09:21:19 -0300 > > > Anderson Valadares <anderva...@gmail.com> wrote: > > > > > > > 2015-06-08 20:33 GMT-03:00 Bill Moran <wmo...@potentialtech.com>: > > > > > > > > > On Mon, 8 Jun 2015 11:59:31 -0300 > > > > > Anderson Valadares <anderva...@gmail.com> wrote: > > > > > > > > > > > Hi > > > > > > We are experiencing an intermittent problem in a GIS database > from a > > > > > > client. Some times of the day is as if the PostgreSQL executed > the > > > > > slowest > > > > > > operations see below an example. The query was performed three > times, > > > > > twice > > > > > > I canceled and the third time I left it to end. The consultation > > > took 10 > > > > > > seconds to finish, but the measured execution time is 20 ms. As > we > > > have > > > > > > several queries that run every 2 seconds when this momentary > delay > > > occurs > > > > > > queries accumulate and burst the limit of 203 connections > allowed. > > > The > > > > > > interval of "momentary stop" are approximately 2 seconds > occurring at > > > > > > random times and during these stoppages occur no locks and no > > > increased > > > > > > load on the server is identified. There is a job 2/2 seconds > locks > > > > > > collection of information, running queries, etc., nmon also > collects > > > > > > information every 2 seconds. My client asks what causes these > > > momentary > > > > > > stops? because it affects all operations of the database? etc. > How > > > do I > > > > > > identify what is causing these delays in executions of > operations in > > > the > > > > > > database? > > > > > > > > > > Number of points to consider: > > > > > * Check the contents of pg_locks and see if something is waiting > on a > > > > > lock when the query is slow. > > > > > > > > There is nothing waiting when the query is slow, see: > > > > > > > > snap_datetime | waiting | count > > > > -------------------------------+---------+------- > > > > 2015-06-05 09:25:00.954731-03 | f | 74 > > > > 2015-06-05 09:26:00.249187-03 | f | 205 > > > > 2015-06-05 09:27:00.826874-03 | f | 207 > > > > > > I don't know what that means. Since you don't show the query that > > > generated that output, I have no idea if your statement is valid, or > > > if you're running a query that will inherently produce incorrect > > > results. > > > > Sorry the information was incomplete. Below is the query that was > > performed to extract information from tables that used to monitor the > > database. > > The snap_datetime column indicates the time that occurred monitoring, > > the waiting column tells you whether any connection was on hold and the > > column > > count tells how many connections existed at the time. > > > > select > > s.snap_datetime > > ,a.waiting > > , count(*) > > from stat_snapshot s, stat_activity a > > where s.snap_id = a.snap_id > > and s.snap_datetime >= '2015-06-05 09:25:00' > > and s.snap_datetime <= '2015-06-05 09:28:00' > > group by > > s.snap_datetime > > ,a.waiting > > order by s.snap_datetime > > ; > > snap_datetime | waiting | count > > -------------------------------+---------+------- > > 2015-06-05 09:25:00.954731-03 | f | 74 > > 2015-06-05 09:26:00.249187-03 | f | 205 > > 2015-06-05 09:27:00.826874-03 | f | 207 > > (3 rows) > > Given what I'm seeing above, there's still a lot of information > missing. Those tables are not standard PostgreSQL tables, and I'm > not aware of any tool that creates them, thus I have to assume > it's a cron job or something similar created in-house. Given the > query you ran and the resultant data, my first guess is that the > data in stat_snapshot and stat_activity is worthless: i.e. it's > captured once per minute, and therefore there are 59+ seconds > worth of detail that aren't captured, thus the actual liklihood > that the those tables will contain any indication of the problem > is very low. >
Yes it is a cron, created at home, running every 1 hour gathering information from PostgreSQL tables (pg_stat_activity, pg_locks, etc). How to improve the collection of information PostgreSQL? > > > > > * Also, ensure log_lock_waits is turned on for the server and check > > > > > PostgreSQL's logs for messages about long lock waits. > > > > > > > > There is also nothing > > > > > > This is even worse than the previous statement. What did you _do_ to > > > determine this? > > > > > > Since you're unable to diagnose and fix the problem on your own, the > > > possibility exists that the problem is with your diagnostic steps. If > > > you don't share the details of those steps, I have no way to help you > > > with them. > > > > See the log in > > > https://github.com/andervalbh/serv-gis01/blob/master/postgresql-2015-06-05_000000.log > > Apologies, my above comment was intended to be in regards to the IO load. > > However, that log shows that you have massive lock contention. In the early > part of the log a process waits over 5 minutes for a share lock: > 2015-06-05 05:51:32 BRT [44236822] - [10.57.1.26] [14-1] > db=citgis,user=opegis LOG: process 44236822 acquired ShareLock on > transaction 3090377758 after 374146.817 ms > > Without seeing the source of the PL/pgSQL functions that seem to be > causing the problem, I can only suggest that you review those functions > for excessive locking and inefficient access. I'm going to guess, with > the large number of AccessExclusiveLocks in the log, that the functions > are doing explicit locking, and not doing so in an efficient manner. > This function PL / pgSQL has been fixed, but see that during the period in which the SQL presenting slowness problem (between 9:26:00 and 9:27:00) there is no block occurrences. > > > > > * Based on the connection behavior you describe, I'm guessing it's > a > > > > > Tomcat app using some sort of conection pool. Whatever it is, > tune > > > > > your connection pool settings so that the max size of the pool > > > doesn't > > > > > exceed the available PostgreSQL connections. At least that will > > > prevent > > > > > errors from happening when the problem occurs. > > > > > > > > Yes it is a tomcat application using connection pooling and will be > set > > > to > > > > not exceed the limit of PostgreSQL connections, but the central > point is > > > > that the number of connections has increased due to what I am calling > > > > "momentary stoppages" > > > > > > I understand that. I made the suggestion as a temporary fix to reduce > > > the impact until you are able to fix the actual cause. > > > > > > > > * The query you you keep cancelling below, run just EXPLAIN on it > (not > > > > > EXPLAIN ANALYZE) which will certainly finish and give you a plan > > > > > that can be reviewed to help determine what the problem is. > > > > > > > > Explain plan in http://explain.depesz.com/s/bWw > > > > > > I'm confused. The query at that URL only took 10ms to complete, which > > > doesn't seem at all unreasonable. > > > > I agree with you and that's why I ask why the delay in the execution at > > certain times > > as demonstrated above. > > > > > > * On the OS, monitor iostat or something else to see if you're > > > > > saturating disk capacity when the problem happens. > > > > > > > > The OS is also being monitored and there occurs no saturation. > > > > > > Again, without any details, it's possible that there is a problem > > > in your monitoring. > > > > > > > > See the monitoring spreadsheet generated from nmon in > > > https://github.com/andervalbh/serv-gis01/blob/master/transfacil07_150605_0000.nmon.rar > > > > > > > > > > > > The Server is a IBM P720 128G RAM PostgreSQL 9.2.9 on > > > > > > powerpc-ibm-aix7.1.0.0, compiled by gcc (GCC) 4.6.4, 64-bit > > > > > > > > > > > > Evolution of the number of connections for a period > > > > > > > > > > > > snap_datetime | #connections > > > > > > ----------------------------------------------+------------- > > > > > > 2015-06-05 09:25:00.954731-03 | 74 > > > > > > 2015-06-05 09:26:00.249187-03 | 205 > > > > > > 2015-06-05 09:27:00.826874-03 | 207 > > > > > > 2015-06-05 09:28:00.374666-03 | 73 > > > > > > 2015-06-05 09:29:00.690696-03 | 75 > > > > > > > > > > > > > > > > > > Occurrence of the problem > > > > > > > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as > > > > > dvi_cod_dados_via_iti, > > > > > > dvi_sub.via_cod_viagem > > > > > > from gis_dvi_dados_viagem_itinerario dvi_sub, > > > gis_via_viagem > > > > > > via_sub, gis_iti_itinerario iti_sub > > > > > > where dvi_sub.via_cod_viagem = > via_sub.via_cod_viagem > > > > > > and via_sub.via_status_viagem = 'A' > > > > > > and via_sub.via_dt_hora_ini > now() - interval '9 > > > hours' > > > > > > and iti_sub.lin_cod_linha = 389 > > > > > > and iti_sub.iti_sentido_itinerario = 'I' > > > > > > and iti_sub.iti_cod_itinerario_linha = > > > > > > via_sub.iti_cod_itinerario_linha > > > > > > and dvi_sub.vlo_cod_localizacao is not null > > > > > > group by dvi_sub.via_cod_viagem, > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto > > > > > > ; > > > > > > Cancel request sent > > > > > > ERROR: canceling statement due to user request > > > > > > Time: 2555.557 ms > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as > > > > > dvi_cod_dados_via_iti, > > > > > > dvi_sub.via_cod_viagem > > > > > > from gis_dvi_dados_viagem_itinerario dvi_sub, > > > gis_via_viagem > > > > > > via_sub, gis_iti_itinerario iti_sub > > > > > > where dvi_sub.via_cod_viagem = > via_sub.via_cod_viagem > > > > > > and via_sub.via_status_viagem = 'A' > > > > > > and via_sub.via_dt_hora_ini > now() - interval '9 > > > hours' > > > > > > and iti_sub.lin_cod_linha = 389 > > > > > > and iti_sub.iti_sentido_itinerario = 'I' > > > > > > and iti_sub.iti_cod_itinerario_linha = > > > > > > via_sub.iti_cod_itinerario_linha > > > > > > and dvi_sub.vlo_cod_localizacao is not null > > > > > > group by dvi_sub.via_cod_viagem, > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto > > > > > > ; > > > > > > Cancel request sent > > > > > > ERROR: canceling statement due to user request > > > > > > Time: 2192.572 ms > > > > > > citgis=# explain analyze select > max(dvi_sub.dvi_cod_dados_via_iti) as > > > > > > dvi_cod_dados_via_iti, > > > > > > dvi_sub.via_cod_viagem > > > > > > from gis_dvi_dados_viagem_itinerario dvi_sub, > > > gis_via_viagem > > > > > > via_sub, gis_iti_itinerario iti_sub > > > > > > where dvi_sub.via_cod_viagem = > via_sub.via_cod_viagem > > > > > > and via_sub.via_status_viagem = 'A' > > > > > > and via_sub.via_dt_hora_ini > now() - interval '9 > > > hours' > > > > > > and iti_sub.lin_cod_linha = 389 > > > > > > and iti_sub.iti_sentido_itinerario = 'I' > > > > > > and iti_sub.iti_cod_itinerario_linha = > > > > > > via_sub.iti_cod_itinerario_linha > > > > > > and dvi_sub.vlo_cod_localizacao is not null > > > > > > group by dvi_sub.via_cod_viagem, > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto > > > > > > ; > > > > > > Cancel request sent > > > > > > ERROR: canceling statement due to user request > > > > > > Time: 10164.015 ms > > > > > > citgis=# select now(); > > > > > > now > > > > > > ------------------------------- > > > > > > 2015-06-05 09:27:22.006072-03 > > > > > > (1 row) > > > > > > > > > > > > Time: 0.152 ms > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as > > > > > dvi_cod_dados_via_iti, > > > > > > dvi_sub.via_cod_viagem > > > > > > from gis_dvi_dados_viagem_itinerario dvi_sub, > > > gis_via_viagem > > > > > > via_sub, gis_iti_itinerario iti_sub > > > > > > where dvi_sub.via_cod_viagem = > via_sub.via_cod_viagem > > > > > > and via_sub.via_status_viagem = 'A' > > > > > > and via_sub.via_dt_hora_ini > now() - interval '9 > > > hours' > > > > > > and iti_sub.lin_cod_linha = 389 > > > > > > and iti_sub.iti_sentido_itinerario = 'I' > > > > > > and iti_sub.iti_cod_itinerario_linha = > > > > > > via_sub.iti_cod_itinerario_linha > > > > > > and dvi_sub.vlo_cod_localizacao is not null > > > > > > group by dvi_sub.via_cod_viagem, > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto > > > > > > ; > > > > > > dvi_cod_dados_via_iti | via_cod_viagem > > > > > > -----------------------+---------------- > > > > > > 1059964443 | 7989813 > > > > > > 1060072723 | 7990876 > > > > > > (2 rows) > > > > > > > > > > > > Time: 5565.175 ms > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as > > > > > dvi_cod_dados_via_iti, > > > > > > dvi_sub.via_cod_viagem > > > > > > from gis_dvi_dados_viagem_itinerario dvi_sub, > > > gis_via_viagem > > > > > > via_sub, gis_iti_itinerario iti_sub > > > > > > where dvi_sub.via_cod_viagem = > via_sub.via_cod_viagem > > > > > > and via_sub.via_status_viagem = 'A' > > > > > > and via_sub.via_dt_hora_ini > now() - interval '9 > > > hours' > > > > > > and iti_sub.lin_cod_linha = 389 > > > > > > and iti_sub.iti_sentido_itinerario = 'I' > > > > > > and iti_sub.iti_cod_itinerario_linha = > > > > > > via_sub.iti_cod_itinerario_linha > > > > > > and dvi_sub.vlo_cod_localizacao is not null > > > > > > group by dvi_sub.via_cod_viagem, > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto > > > > > > ; > > > > > > dvi_cod_dados_via_iti | via_cod_viagem > > > > > > -----------------------+---------------- > > > > > > 1059964445 | 7989813 > > > > > > 1060072725 | 7990876 > > > > > > (2 rows) > > > > > > > > > > > > Time: 27.944 ms > > > > > > citgis=# select max(dvi_sub.dvi_cod_dados_via_iti) as > > > > > dvi_cod_dados_via_iti, > > > > > > dvi_sub.via_cod_viagem > > > > > > from gis_dvi_dados_viagem_itinerario dvi_sub, > > > gis_via_viagem > > > > > > via_sub, gis_iti_itinerario iti_sub > > > > > > where dvi_sub.via_cod_viagem = > via_sub.via_cod_viagem > > > > > > and via_sub.via_status_viagem = 'A' > > > > > > and via_sub.via_dt_hora_ini > now() - interval '9 > > > hours' > > > > > > and iti_sub.lin_cod_linha = 389 > > > > > > and iti_sub.iti_sentido_itinerario = 'I' > > > > > > and iti_sub.iti_cod_itinerario_linha = > > > > > > via_sub.iti_cod_itinerario_linha > > > > > > and dvi_sub.vlo_cod_localizacao is not null > > > > > > group by dvi_sub.via_cod_viagem, > > > > > > iti_sub.iti_cod_itinerario_linha, via_sub.equ_cod_eqpto > > > > > > ; > > > > > > dvi_cod_dados_via_iti | via_cod_viagem > > > > > > -----------------------+---------------- > > > > > > 1059964445 | 7989813 > > > > > > 1060072727 | 7990876 > > > > > > (2 rows) > > > > > > > > > > > > Time: 24.428 ms > > > > > > > > > > > > Greetings > > > > > > > > > > > > Anderson > > > > > > > > > > > > > > > -- > > > > > Bill Moran <wmo...@potentialtech.com> > > > > > > > > > > > > > > -- > > > Bill Moran <wmo...@potentialtech.com> > > > > > > -- > Bill Moran <wmo...@potentialtech.com> >