Thanks, I will tomorrow and report thereafter. I hope we will find a simple explanation. :)

E

On 8/12/13 5:07 PM, Karl Wright wrote:
Hi Erlend,

You have wire logging (httpclient) enabled, which is useful for
debugging fetch issues, but you do not have connector debugging on.  To
turn it on, add this to properties.xml:

<property name="org.apache.manifoldcf.connectors" value="DEBUG"/>

thanks,
Karl


On Mon, Aug 12, 2013 at 10:53 AM, Erlend Garåsen
<[email protected] <mailto:[email protected]>> wrote:

    On 8/12/13 4:29 PM, Karl Wright wrote:

        Hi Erlend,

        The Document Status report shows these documents because they
        are still
        in the queue.  The reasons for this could be several.  Documents
        that
        exceed the hopcount by 1 level are allowed to remain in the
        queue for
        bookkeeping purposes.  "scheduled date" as given only meaningful
        if the
        document is in an active state; my guess is that these documents
        are not
        in fact in that state, but rather in the state
        HOPCOUNT_EXCEEDED.  Can
        you include one complete row from the Document Status report for
        one of
        the missing documents?


    For "http://www.ibsen.uio.no/__sakprosa.xhtml
    <http://www.ibsen.uio.no/sakprosa.xhtml>":
    Job: Ibsen

    State: Out of scope
    Status: Hopcount exceeded
    Scheduled: 01-01-1970 01:00:00.000
    Scheduled action: Process
    Retry count: N/A
    Retry limit: N/A


        When you added documents to the seed list, what did the Simple
        History
        say when they were fetched?  If they don't appear in the simple
        history,
        they SHOULD have nevertheless appeared in the log, with an
        explanation
        of why they were excluded, provided you have connector debugging
        enabled.


    OK, here is the seed list:
    http://www.ibsen.uio.no/

    http://www.ibsen.uio.no/__skuespill.xhtml
    <http://www.ibsen.uio.no/skuespill.xhtml>
    http://www.ibsen.uio.no/dikt.__xhtml
    <http://www.ibsen.uio.no/dikt.xhtml>
    http://www.ibsen.uio.no/brev.__xhtml
    <http://www.ibsen.uio.no/brev.xhtml>
    http://www.ibsen.uio.no/__sakprosa.xhtml
    <http://www.ibsen.uio.no/sakprosa.xhtml>
    http://www.ibsen.uio.no/varia.__xhtml
    <http://www.ibsen.uio.no/varia.xhtml>
    http://www.ibsen.uio.no/__undervisningsressurser.xhtml
    <http://www.ibsen.uio.no/undervisningsressurser.xhtml>

    Here is the results from simple history:
    08-12-2013 16:46:26.536         job end         1368534065016(Ibsen)
                     0       1
    08-12-2013 16:46:09.927         document ingest (Solr)
    http://www.ibsen.uio.no/__forside.xhtml
    <http://www.ibsen.uio.no/forside.xhtml>
             OK      11897   178
    08-12-2013 16:46:09.751         fetch
    http://www.ibsen.uio.no/__forside.xhtml
    <http://www.ibsen.uio.no/forside.xhtml>
             200     11897   17
    08-12-2013 16:44:48.829         fetch http://www.ibsen.uio.no/
             302     0       79484
    08-12-2013 16:44:48.727         robots parse www.ibsen.uio.no:80
    <http://www.ibsen.uio.no:80>

             HTML    0       2       Robots file contained HTML, skipped
    08-12-2013 16:44:46.574         job start       1368534065016(Ibsen)
                     0       1
             1

    HttpClient log:
    http://folk.uio.no/erlendfg/__manifoldcf/manifoldcf.log
    <http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log>

    Erlend



Reply via email to