Thanks for looking at this issue, Karl.

Yes, the db tables may be corrupted as a result of a lot of debugging I did in May.

Anyway, I have used tcpdump in order to investigate the traffic further.

1. MCF tries to fetch http://www.ibsen.uio.no/
2. Server response is 302 (redirected to
http://www.ibsen.uio.no/forside.xhtml)
3. MCF tries to fetch http://www.ibsen.uio.no/forside.xhtml
4. Server response is 200

And that's all. Then we know that there is no server error involved.

Log from tcpdump:
http://folk.uio.no/erlendfg/manifoldcf/chatter.dmp

Erlend

On 8/13/13 3:16 PM, Karl Wright wrote:
Hmm. This is not at all what I would have expected.

If "skueskill" is directly referenced by a seed document, or (worse) is
in the seed list, I cannot see *how* the document can possibly have this
state.

- the referencing document definitely has a parseable reference to the
document in question, and in any case having it be a "seed" should make
the hopcount be zero;
- if the reference is being filtered, it would be filtered from
everywhere, and the document should thus get removed from the queue at
the end of the job, because it is unreachable.
- even if the hopcount tables have gotten corrupted, the fact that the
document is a first-level reference or a seed should overwrite the
record for that document.

So I am at a complete loss to explain this behavior.

Let me look through the code and see if I can find any code path that
could lead to this behavior.
Karl


On Tue, Aug 13, 2013 at 9:01 AM, Erlend Garåsen <[email protected]
<mailto:[email protected]>> wrote:

    On 8/13/13 2:47 PM, Karl Wright wrote:

        Looks like you need to re-enable connector debugging before we
        can see
        anything.


    Unfortunately, yes. A bording task which must be done.


        Also, does the missing document (skuespill) appear in the Document
        Status report after the crawl?  Can you include that here if it
        does?
        (I am betting it does not...)


    I added 60 mins as a time offset value, but I'm not 100% sure
    whether the given result from Document status was created by this
    job run or is an old entry in the database:

    Idenfifier: http://www.ibsen.uio.no/__skuespill.xhtml
    <http://www.ibsen.uio.no/skuespill.xhtml>

    Job: Ibsen
    State: Out of scope
    Statu: Hopcount exceeded

    Scheduled: 01-01-1970 01:00:00.000
    Scheduled action: Process
    Retry count / limit: N/A

    Erlend



Reply via email to