Hi Adrian, Just restarting the job should be sufficient to get it sorted out after this kind of failure.
Karl On Mon, Jan 12, 2015 at 6:19 AM, Adrian Conlon <[email protected]> wrote: > Thanks Karl. > > > > With regards the runtime environment, apologies for the omission of > Postgresql version. It’s v9.3. > > > > For the stack trace, I’ve just installed a jdk on the problematic server > and tried out “jstack” (that’s a neat tool!), so I’m all systems go for the > next time agents process doesn’t respond to a stop request. > > > > With regards jobs that have this unexpected “jobqueue” status; do they > sort themselves out the next time the job runs? Is there anything I should > do to “help” the job along? > > > > Adrian > > > > *From:* Karl Wright [mailto:[email protected]] > *Sent:* 12 January 2015 00:27 > *To:* [email protected] > *Subject:* Re: Error: Unexpected jobqueue status > > > > Also, if you are having trouble shutting down the agents process, it would > be great if you could get a thread dump and post it, before you kill -9 it. > > Karl > > > > On Sun, Jan 11, 2015 at 7:25 PM, Karl Wright <[email protected]> wrote: > > Hi Adrian, > > If you noted the comment stream in CONNECTORS-590, I was able to > demonstrate conclusively that the problem was in Postgres. I have not seen > the problem in 9.3, but that does not mean it's gone. What version of > Postgresql are you using? > > In any case, while this problem definitely terminates your job, it will > not happen very often. I suspect the frequency of occurrence may depend on > how loaded the database is. > > Karl > > > > On Sun, Jan 11, 2015 at 7:14 PM, Adrian Conlon <[email protected]> > wrote: > > Hi All, > > > > I’m getting an occurrence of what looks very similar to CONNECTORS-590. > > > > The circumstances are: > > > > 1) MCF Jobs proceeding very slowly (looks like a Postgresql vacuum > is needed) > > 2) Stop tomcat > > 3) Attempt to stop the agents normally > > 4) Wait a minute or two > > 5) Decide to “kill -9” the agents process > > 6) Vacuum the database > > 7) Restart tomcat > > 8) Restart the agents > > > > When I checked the job status page, I found that two of the jobs (out > around 4000 or so) had the following status (or very similar): > > > > Error: Unexpected jobqueue status - record id 1417115392831, expecting > active status, saw 4 > > > > Setup-wise, I’m running a release candidate of v1.8 RC (I think RC2), > using postresql as the crawl database and running on Ubuntu Linux. I’m > using zookeeper style synchronisation. > > > > Let me know if more information etc. is needed or if you think it’s a > new/real issue. > > Adrian > > ____________________________________________________________ > Electronic mail messages entering and leaving Arup business > systems are scanned for acceptability of content and viruses > > > > >
