I just reviewed the places where JobQueue status is updated to be sure that
all places this happens are included in the diagnostics trackiing.  They
were, save for when a cluster gets restarted, and I added that to the code
in trunk as well.

So the analysis, above, stands.

:-(

Karl


On Thu, Jun 6, 2019 at 4:28 PM Karl Wright <[email protected]> wrote:

> I just looked at this briefly.
>
> For
> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204,
> look for the following record ID:
>
> 1522147023170
>
> This is the one when we look at it after the document is processed that it 
> has status 4:
>
> >>>>>>
>
> manifoldcf_1     | 2019-06-06T02:01:05.380401275Z 2019-06-06T04:01:05,380 
> DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: ==== Forensics 
> for record 1522147023170, current status: 4 ====
>
> manifoldcf_1     | 2019-06-06T02:01:05.380512360Z 2019-06-06T04:01:05,380 
> DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: === Current 
> stack trace ===
> manifoldcf_1     | 2019-06-06T02:01:05.380544192Z java.lang.Exception: 
> Forensics stack trace
> manifoldcf_1     | 2019-06-06T02:01:05.380548494Z     at 
> org.apache.manifoldcf.crawler.jobs.TrackerClass.printForensics(TrackerClass.java:216)
>  [mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.380552564Z     at 
> org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1028)
>  [mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.380568210Z     at 
> org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3414)
>  [mcf-pull-agent.jar:?]
>
> <<<<<<
>
> It expects it to have status 5, though, because the only way the document 
> could be being worked on at all is if the document had been put into status 5 
> by the stuffer thread:
>
> >>>>>>
>
> manifoldcf_1     | 2019-06-06T02:01:05.387210733Z 2019-06-06T04:01:05,381 
> DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics: == About to 
> commit transaction by 'Stuffer thread' at 1559786464972 ==
> manifoldcf_1     | 2019-06-06T02:01:05.387250827Z java.lang.Exception: 
> Precommit stack trace
> manifoldcf_1     | 2019-06-06T02:01:05.387256464Z     at 
> org.apache.manifoldcf.crawler.jobs.TrackerClass.notePrecommit(TrackerClass.java:108)
>  ~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387260987Z     at 
> org.apache.manifoldcf.crawler.jobs.JobManager.fetchAndProcessDocuments(JobManager.java:3295)
>  ~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387265034Z     at 
> org.apache.manifoldcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:3014)
>  ~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387269073Z     at 
> org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:186)
>  ~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387285663Z 2019-06-06T04:01:05,387 
> DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics:     Transaction 
> includes:
> manifoldcf_1     | 2019-06-06T02:01:05.387290117Z 2019-06-06T04:01:05,387 
> DEBUG [Worker thread '37'] org.apache.manifoldcf.diagnostics:       Record 
> 1522147023170 status modified to 5: Make active
>
> ...
>
> <<<<<<
>
> The actual commit follows a short while later to actually set the status, and 
> only AFTER that are the documents queued to be handed to the worker threads.
>
> So what this is saying is that the stuffer thread sets the document status to 
> 5 and then a worker thread gets it and when the worker thread is done it 
> tries to set the status back to 4 -- but it's ALREADY at 4 and that is not 
> possible.
>
> Now, it may be worth examining the possibility that some other job gets to 
> the document while it is waiting to be queued, and does stuff to the 
> document's jobqueue record.  There are a couple of background threads which 
> (for example) reprioritize documents that might conceivably change the 
> jobqueue record's status.  They generally work by changing the record status 
> in a transaction to some other value that would not make the record eligible 
> for queuing, and then changing it back when done.  But unless completely 
> overlooked, these would, however, be visible in the diagnostics, and 
> furthermore it should not be possible for one thread in a transaction to 
> change the status at the same time as another thread in a transaction changes 
> the status.
>
> So either way I'm forced to conclude that a record gets changed and 
> committed, but the change is not visible to another thread later trying to 
> reset the status.
>
> Karl
>
>
>
>
> On Thu, Jun 6, 2019 at 5:58 AM Karl Wright <[email protected]> wrote:
>
>> I can look at the log output but not until this weekend.
>>
>> Karl
>>
>>
>> On Thu, Jun 6, 2019 at 3:59 AM Markus Schuch <
>> [email protected]> wrote:
>>
>>> Hi Olivier,
>>>
>>>
>>>
>>> we were not able to fix this yet.
>>>
>>>
>>>
>>> But now we have new diagnostics log data, after the error occurred again
>>> yesterday:
>>>
>>>
>>>
>>> Unexpected jobqueue status - record id 1522147023170, expecting active
>>> status, saw 4
>>>
>>>
>>> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204
>>>
>>>
>>>
>>> Unexpected jobqueue status - record id 1541529864711, expecting active
>>> status, saw 4
>>>
>>>
>>> https://gist.githubusercontent.com/schuch/d73672e44b0edd5a9184170c9ee417a5/raw/f55b1c023056ef8e310d99489a0b6f9dfcaaa585/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201541529864711,%2520expecting%2520active%2520status,%2520saw%25204
>>>
>>>
>>>
>>> @Karl, maybe you can get some ideas from that log data?
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Markus
>>>
>>>
>>>
>>> *Von:* Olivier Tavard <[email protected]>
>>> *Gesendet:* Dienstag, 21. Mai 2019 11:00
>>> *An:* [email protected]
>>> *Betreff:* Re: Error: Unexpected jobqueue status - record id X,
>>> expecting active status, saw 4 (MySQL compatible Database)
>>>
>>>
>>>
>>> Hi Markus,
>>>
>>>
>>>
>>> We have the same error (with postgresql database). Did the error occur
>>> again since your last mail ?
>>>
>>> Did you change something on your MCF configuration to fix this ?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Best regards,
>>>
>>>
>>>
>>> Olivier
>>>
>>>
>>>
>>>
>>>
>>> Le 13 févr. 2019 à 13:58, Markus Schuch <[email protected]>
>>> a écrit :
>>>
>>>
>>>
>>> Hi Karl,
>>>
>>>
>>>
>>> we set the diagnostigs logger to level debug.
>>>
>>>
>>>
>>> I will get back when the error occurs again.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Markus
>>>
>>>
>>> ------------------------------
>>>
>>>
>>> Pflichtangaben anzeigen
>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>
>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier:
>>>  http://www.deutschebahn.com/de/konzern/datenschutz
>>>
>>>
>>>
>>> Le 12 févr. 2019 à 17:41, Karl Wright <[email protected]> a écrit :
>>>
>>>
>>>
>>> Hi Marcus,
>>>
>>>
>>>
>>> There's a properties.xml debugging logger you can enable that will keep
>>> track of what's happening with transactions, so that when an error of this
>>> kinds is reported, information about why the situation is unexpected is
>>> dumped to the log.  The logger is called "diagnostics" e.g.
>>> "org.apache.manifoldcf.diagnostics".
>>>
>>>
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch <
>>> [email protected]> wrote:
>>>
>>> Hi,
>>>
>>> we are seeing "Error: Unexpected jobqueue status - record id
>>> 1484612513829, expecting active status, saw 4" from time to time.
>>> I didn’t report it, because we were running on an old MCF version, and
>>> there were some bugreports relating to this error that are resolved in
>>> newer versions.
>>>
>>> Now we have upgraded to latest and greatest and the error still
>>> occurred. So want to start to track this down.
>>>
>>> Out setup is:
>>>
>>>    - ManifoldCF 2.12, running in a Docker Container based on Redhat
>>>    Linux, OpenJDK 8
>>>    - AWS RDS Database (Aurora MySQL -> 5.6 compatible)
>>>    - Single Process Setup
>>>
>>>
>>> We run several Jobs (run once mode) over night with schedule windows and
>>> max runtime settings. Some of the time windows are overlapping.
>>> At normal days some Jobs finish during their time window, some go to
>>> WAITING and will go on in the next night.
>>>
>>> The error mostly hits at a Sharepoint Repository Job.
>>> We have the impression, that the error is somehow related to situations,
>>> when service interruptions (e.g. connection issues) occur in other jobs.
>>>
>>>
>>>
>>> Maybe all this is related to fundamental problems with the database or
>>> the JDBC driver
>>>
>>> (Karl expressed his concerns about that in the maybe-related
>>> https://issues.apache.org/jira/browse/CONNECTORS-1581 ticket)
>>>
>>> On the other hand, there were similar bugreports in the past concerning
>>> unexpected jobqueue status and there were resolved.
>>>
>>> Mayby there is a chance this can also be analyzed and fixed, but I do
>>> not really know where to start.
>>>
>>>
>>> Has anybody ideas how we can track this down / debug this to get to the
>>> bottom of the problem?
>>> What could be the first step in the analysis?
>>>
>>> Many thanks in advance
>>> Markus
>>>
>>> May be related bugreports:
>>>
>>>
>>>
>>> [0] https://issues.apache.org/jira/browse/CONNECTORS-1395
>>> [1] https://issues.apache.org/jira/browse/CONNECTORS-1180
>>> [2] https://issues.apache.org/jira/browse/CONNECTORS-590
>>> [3] https://issues.apache.org/jira/browse/CONNECTORS-246
>>>
>>>
>>>
>>> --
>>>
>>> Markus Schuch
>>>
>>> Web Business (T.IPB 26)
>>>
>>>
>>>
>>> DB Systel GmbH
>>>
>>> Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main
>>>
>>>
>>> ------------------------------
>>>
>>>
>>> Pflichtangaben anzeigen
>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>
>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Pflichtangaben anzeigen
>>> <http://www.deutschebahn.com/pflichtangaben/20190529>
>>>
>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>
>>

Reply via email to