Thanks Karl.  Restarting the job manually did fix the problem.

I might add a check for this in my software and kick the job into life again 
automatically, now I know it works…

Adrian

From: Karl Wright [mailto:[email protected]]
Sent: 12 January 2015 11:34
To: [email protected]
Subject: Re: Error: Unexpected jobqueue status

Hi Adrian,
Just restarting the job should be sufficient to get it sorted out after this 
kind of failure.

Karl

On Mon, Jan 12, 2015 at 6:19 AM, Adrian Conlon 
<[email protected]<mailto:[email protected]>> wrote:
Thanks Karl.

With regards the runtime environment, apologies for the omission of Postgresql 
version.  It’s v9.3.

For the stack trace, I’ve just installed a jdk on the problematic server and 
tried out “jstack” (that’s a neat tool!), so I’m all systems go for the next 
time agents process doesn’t respond to a stop request.

With regards jobs that have this unexpected “jobqueue” status; do they sort 
themselves out the next time the job runs?  Is there anything I should do to 
“help” the job along?

Adrian

From: Karl Wright [mailto:[email protected]<mailto:[email protected]>]
Sent: 12 January 2015 00:27
To: [email protected]<mailto:[email protected]>
Subject: Re: Error: Unexpected jobqueue status

Also, if you are having trouble shutting down the agents process, it would be 
great if you could get a thread dump and post it, before you kill -9 it.

Karl

On Sun, Jan 11, 2015 at 7:25 PM, Karl Wright 
<[email protected]<mailto:[email protected]>> wrote:
Hi Adrian,
If you noted the comment stream in CONNECTORS-590, I was able to demonstrate 
conclusively that the problem was in Postgres.  I have not seen the problem in 
9.3, but that does not mean it's gone.  What version of Postgresql are you 
using?

In any case, while this problem definitely terminates your job, it will not 
happen very often.  I suspect the frequency of occurrence may depend on how 
loaded the database is.

Karl

On Sun, Jan 11, 2015 at 7:14 PM, Adrian Conlon 
<[email protected]<mailto:[email protected]>> wrote:
Hi All,

I’m getting an occurrence of what looks very similar to CONNECTORS-590.

The circumstances are:


1)      MCF Jobs proceeding very slowly (looks like a Postgresql vacuum is 
needed)

2)      Stop tomcat

3)      Attempt to stop the agents normally

4)      Wait a minute or two

5)      Decide to “kill -9” the agents process

6)      Vacuum the database

7)      Restart tomcat

8)      Restart the agents

When I checked the job status page, I found that two of the jobs (out around 
4000 or so) had the following status (or very similar):

Error: Unexpected jobqueue status - record id 1417115392831, expecting active 
status, saw 4

Setup-wise, I’m running a release candidate of v1.8 RC (I think RC2), using 
postresql as the crawl database and running on Ubuntu Linux.  I’m using 
zookeeper style synchronisation.

Let me know if more information etc. is needed or if you think it’s a new/real 
issue.
Adrian

____________________________________________________________
Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses



Reply via email to