On Sat, Mar 16, 2013 at 11:25 PM, Bastiaan Olij <basti...@basenlily.me> wrote: > Hi Leo, > > On 17/03/13 9:37 AM, Paragon Corporation wrote: >> This has been an issue for a while but has only become a more common issue >> for us recently and one we can more easily predict. >> >> We are running pgAgent on windows, and whenever we restart the server, if a >> job is in the middle of a run when we restart, it gets stuck in a forever >> endless running state. >> >> To fix the issue, we have to go into the pgagent.pga_job table and get rid >> of the jobagentid that is in there for the specific job that is stuck. >> >> We are running the pgAgent 3.3.0 that is available via Stack Builder. >> > When the job starts pgAgent simply updates the status to running. When > it finishes it updates the status to either failed or completed. > When you restart in the middle of a job running that job gets > interrupted but as pgAgent also shuts down it is not able to change the > status on the job. When it restarts it has all but forgotten about that > job that was running.
Actually, that's not what should happen. When an agent first runs, it records it's backend PID in pga_jobagent. Then, when it starts a job it sets the pga_job.jobagentid column to it's backend PID as well. Later, when the agent restarts it: 1) Checks the pga_jobagent table for agent PIDs that are thought to be running, and compares them with the PIDs listed in pg_stat_activity. Any that don't existing in pg_stat_activity are considered to be zombies and are recorded in a temp table. 2) It then updates the job log and job step log for any jobs/steps that were being executed by a zombie agent, setting the status to 'd' (oddly, for aborted). 3) It then resets the execution info in pga_job, so the job can execute again on it's next schedule. 4) Finally it clears the entry from the pga_jobagent table. As far as I'm aware, this has always worked well. The only caveat being that it will fail of course, if when the agent restarts there happens to be a new connection with the same backend PID as was previously used. That means that the zombie detection can conceivably fail from time to time, but that should be relatively rare. I'm not sure there's any easy way round that, though maybe there is something that could be done with the appname GUC that we have now. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgadmin-support mailing list (pgadmin-support@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgadmin-support