I just reviewed the places where JobQueue status is updated to be sure that all places this happens are included in the diagnostics trackiing. They were, save for when a cluster gets restarted, and I added that to the code in trunk as well.
So the analysis, above, stands. :-( Karl On Thu, Jun 6, 2019 at 4:28 PM Karl Wright <[email protected]> wrote: > I just looked at this briefly. > > For > https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204, > look for the following record ID: > > 1522147023170 > > This is the one when we look at it after the document is processed that it > has status 4: > > >>>>>> > > manifoldcf_1 | 2019-06-06T02:01:05.380401275Z 2019-06-06T04:01:05,380 > DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: ==== Forensics > for record 1522147023170, current status: 4 ==== > > manifoldcf_1 | 2019-06-06T02:01:05.380512360Z 2019-06-06T04:01:05,380 > DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: === Current > stack trace === > manifoldcf_1 | 2019-06-06T02:01:05.380544192Z java.lang.Exception: > Forensics stack trace > manifoldcf_1 | 2019-06-06T02:01:05.380548494Z at > org.apache.manifoldcf.crawler.jobs.TrackerClass.printForensics(TrackerClass.java:216) > [mcf-pull-agent.jar:?] > manifoldcf_1 | 2019-06-06T02:01:05.380552564Z at > org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1028) > [mcf-pull-agent.jar:?] > manifoldcf_1 | 2019-06-06T02:01:05.380568210Z at > org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3414) > [mcf-pull-agent.jar:?] > > <<<<<< > > It expects it to have status 5, though, because the only way the document > could be being worked on at all is if the document had been put into status 5 > by the stuffer thread: > > >>>>>> > > manifoldcf_1 | 2019-06-06T02:01:05.387210733Z 2019-06-06T04:01:05,381 > DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: == About to > commit transaction by 'Stuffer thread' at 1559786464972 == > manifoldcf_1 | 2019-06-06T02:01:05.387250827Z java.lang.Exception: > Precommit stack trace > manifoldcf_1 | 2019-06-06T02:01:05.387256464Z at > org.apache.manifoldcf.crawler.jobs.TrackerClass.notePrecommit(TrackerClass.java:108) > ~[mcf-pull-agent.jar:?] > manifoldcf_1 | 2019-06-06T02:01:05.387260987Z at > org.apache.manifoldcf.crawler.jobs.JobManager.fetchAndProcessDocuments(JobManager.java:3295) > ~[mcf-pull-agent.jar:?] > manifoldcf_1 | 2019-06-06T02:01:05.387265034Z at > org.apache.manifoldcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:3014) > ~[mcf-pull-agent.jar:?] > manifoldcf_1 | 2019-06-06T02:01:05.387269073Z at > org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:186) > ~[mcf-pull-agent.jar:?] > manifoldcf_1 | 2019-06-06T02:01:05.387285663Z 2019-06-06T04:01:05,387 > DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: Transaction > includes: > manifoldcf_1 | 2019-06-06T02:01:05.387290117Z 2019-06-06T04:01:05,387 > DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: Record > 1522147023170 status modified to 5: Make active > > ... > > <<<<<< > > The actual commit follows a short while later to actually set the status, and > only AFTER that are the documents queued to be handed to the worker threads. > > So what this is saying is that the stuffer thread sets the document status to > 5 and then a worker thread gets it and when the worker thread is done it > tries to set the status back to 4 -- but it's ALREADY at 4 and that is not > possible. > > Now, it may be worth examining the possibility that some other job gets to > the document while it is waiting to be queued, and does stuff to the > document's jobqueue record. There are a couple of background threads which > (for example) reprioritize documents that might conceivably change the > jobqueue record's status. They generally work by changing the record status > in a transaction to some other value that would not make the record eligible > for queuing, and then changing it back when done. But unless completely > overlooked, these would, however, be visible in the diagnostics, and > furthermore it should not be possible for one thread in a transaction to > change the status at the same time as another thread in a transaction changes > the status. > > So either way I'm forced to conclude that a record gets changed and > committed, but the change is not visible to another thread later trying to > reset the status. > > Karl > > > > > On Thu, Jun 6, 2019 at 5:58 AM Karl Wright <[email protected]> wrote: > >> I can look at the log output but not until this weekend. >> >> Karl >> >> >> On Thu, Jun 6, 2019 at 3:59 AM Markus Schuch < >> [email protected]> wrote: >> >>> Hi Olivier, >>> >>> >>> >>> we were not able to fix this yet. >>> >>> >>> >>> But now we have new diagnostics log data, after the error occurred again >>> yesterday: >>> >>> >>> >>> Unexpected jobqueue status - record id 1522147023170, expecting active >>> status, saw 4 >>> >>> >>> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204 >>> >>> >>> >>> Unexpected jobqueue status - record id 1541529864711, expecting active >>> status, saw 4 >>> >>> >>> https://gist.githubusercontent.com/schuch/d73672e44b0edd5a9184170c9ee417a5/raw/f55b1c023056ef8e310d99489a0b6f9dfcaaa585/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201541529864711,%2520expecting%2520active%2520status,%2520saw%25204 >>> >>> >>> >>> @Karl, maybe you can get some ideas from that log data? >>> >>> >>> >>> Cheers, >>> >>> Markus >>> >>> >>> >>> *Von:* Olivier Tavard <[email protected]> >>> *Gesendet:* Dienstag, 21. Mai 2019 11:00 >>> *An:* [email protected] >>> *Betreff:* Re: Error: Unexpected jobqueue status - record id X, >>> expecting active status, saw 4 (MySQL compatible Database) >>> >>> >>> >>> Hi Markus, >>> >>> >>> >>> We have the same error (with postgresql database). Did the error occur >>> again since your last mail ? >>> >>> Did you change something on your MCF configuration to fix this ? >>> >>> >>> >>> Thanks, >>> >>> Best regards, >>> >>> >>> >>> Olivier >>> >>> >>> >>> >>> >>> Le 13 févr. 2019 à 13:58, Markus Schuch <[email protected]> >>> a écrit : >>> >>> >>> >>> Hi Karl, >>> >>> >>> >>> we set the diagnostigs logger to level debug. >>> >>> >>> >>> I will get back when the error occurs again. >>> >>> >>> >>> Cheers, >>> >>> Markus >>> >>> >>> ------------------------------ >>> >>> >>> Pflichtangaben anzeigen >>> <http://www.deutschebahn.com/pflichtangaben/20190211> >>> >>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier: >>> http://www.deutschebahn.com/de/konzern/datenschutz >>> >>> >>> >>> Le 12 févr. 2019 à 17:41, Karl Wright <[email protected]> a écrit : >>> >>> >>> >>> Hi Marcus, >>> >>> >>> >>> There's a properties.xml debugging logger you can enable that will keep >>> track of what's happening with transactions, so that when an error of this >>> kinds is reported, information about why the situation is unexpected is >>> dumped to the log. The logger is called "diagnostics" e.g. >>> "org.apache.manifoldcf.diagnostics". >>> >>> >>> >>> Karl >>> >>> >>> >>> >>> >>> On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch < >>> [email protected]> wrote: >>> >>> Hi, >>> >>> we are seeing "Error: Unexpected jobqueue status - record id >>> 1484612513829, expecting active status, saw 4" from time to time. >>> I didn’t report it, because we were running on an old MCF version, and >>> there were some bugreports relating to this error that are resolved in >>> newer versions. >>> >>> Now we have upgraded to latest and greatest and the error still >>> occurred. So want to start to track this down. >>> >>> Out setup is: >>> >>> - ManifoldCF 2.12, running in a Docker Container based on Redhat >>> Linux, OpenJDK 8 >>> - AWS RDS Database (Aurora MySQL -> 5.6 compatible) >>> - Single Process Setup >>> >>> >>> We run several Jobs (run once mode) over night with schedule windows and >>> max runtime settings. Some of the time windows are overlapping. >>> At normal days some Jobs finish during their time window, some go to >>> WAITING and will go on in the next night. >>> >>> The error mostly hits at a Sharepoint Repository Job. >>> We have the impression, that the error is somehow related to situations, >>> when service interruptions (e.g. connection issues) occur in other jobs. >>> >>> >>> >>> Maybe all this is related to fundamental problems with the database or >>> the JDBC driver >>> >>> (Karl expressed his concerns about that in the maybe-related >>> https://issues.apache.org/jira/browse/CONNECTORS-1581 ticket) >>> >>> On the other hand, there were similar bugreports in the past concerning >>> unexpected jobqueue status and there were resolved. >>> >>> Mayby there is a chance this can also be analyzed and fixed, but I do >>> not really know where to start. >>> >>> >>> Has anybody ideas how we can track this down / debug this to get to the >>> bottom of the problem? >>> What could be the first step in the analysis? >>> >>> Many thanks in advance >>> Markus >>> >>> May be related bugreports: >>> >>> >>> >>> [0] https://issues.apache.org/jira/browse/CONNECTORS-1395 >>> [1] https://issues.apache.org/jira/browse/CONNECTORS-1180 >>> [2] https://issues.apache.org/jira/browse/CONNECTORS-590 >>> [3] https://issues.apache.org/jira/browse/CONNECTORS-246 >>> >>> >>> >>> -- >>> >>> Markus Schuch >>> >>> Web Business (T.IPB 26) >>> >>> >>> >>> DB Systel GmbH >>> >>> Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main >>> >>> >>> ------------------------------ >>> >>> >>> Pflichtangaben anzeigen >>> <http://www.deutschebahn.com/pflichtangaben/20190211> >>> >>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie >>> hier: http://www.deutschebahn.com/de/konzern/datenschutz >>> >>> >>> >>> >>> ------------------------------ >>> >>> Pflichtangaben anzeigen >>> <http://www.deutschebahn.com/pflichtangaben/20190529> >>> >>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie >>> hier: http://www.deutschebahn.com/de/konzern/datenschutz >>> >>
