Thanks for looking at this issue, Karl.
Yes, the db tables may be corrupted as a result of a lot of debugging I did in May.
Anyway, I have used tcpdump in order to investigate the traffic further. 1. MCF tries to fetch http://www.ibsen.uio.no/ 2. Server response is 302 (redirected to http://www.ibsen.uio.no/forside.xhtml) 3. MCF tries to fetch http://www.ibsen.uio.no/forside.xhtml 4. Server response is 200 And that's all. Then we know that there is no server error involved. Log from tcpdump: http://folk.uio.no/erlendfg/manifoldcf/chatter.dmp Erlend On 8/13/13 3:16 PM, Karl Wright wrote:
Hmm. This is not at all what I would have expected. If "skueskill" is directly referenced by a seed document, or (worse) is in the seed list, I cannot see *how* the document can possibly have this state. - the referencing document definitely has a parseable reference to the document in question, and in any case having it be a "seed" should make the hopcount be zero; - if the reference is being filtered, it would be filtered from everywhere, and the document should thus get removed from the queue at the end of the job, because it is unreachable. - even if the hopcount tables have gotten corrupted, the fact that the document is a first-level reference or a seed should overwrite the record for that document. So I am at a complete loss to explain this behavior. Let me look through the code and see if I can find any code path that could lead to this behavior. Karl On Tue, Aug 13, 2013 at 9:01 AM, Erlend Garåsen <[email protected] <mailto:[email protected]>> wrote: On 8/13/13 2:47 PM, Karl Wright wrote: Looks like you need to re-enable connector debugging before we can see anything. Unfortunately, yes. A bording task which must be done. Also, does the missing document (skuespill) appear in the Document Status report after the crawl? Can you include that here if it does? (I am betting it does not...) I added 60 mins as a time offset value, but I'm not 100% sure whether the given result from Document status was created by this job run or is an old entry in the database: Idenfifier: http://www.ibsen.uio.no/__skuespill.xhtml <http://www.ibsen.uio.no/skuespill.xhtml> Job: Ibsen State: Out of scope Statu: Hopcount exceeded Scheduled: 01-01-1970 01:00:00.000 Scheduled action: Process Retry count / limit: N/A Erlend
