Hi Erlend,

You have wire logging (httpclient) enabled, which is useful for debugging
fetch issues, but you do not have connector debugging on.  To turn it on,
add this to properties.xml:

<property name="org.apache.manifoldcf.connectors" value="DEBUG"/>

thanks,
Karl


On Mon, Aug 12, 2013 at 10:53 AM, Erlend Garåsen <[email protected]>wrote:

> On 8/12/13 4:29 PM, Karl Wright wrote:
>
>> Hi Erlend,
>>
>> The Document Status report shows these documents because they are still
>> in the queue.  The reasons for this could be several.  Documents that
>> exceed the hopcount by 1 level are allowed to remain in the queue for
>> bookkeeping purposes.  "scheduled date" as given only meaningful if the
>> document is in an active state; my guess is that these documents are not
>> in fact in that state, but rather in the state HOPCOUNT_EXCEEDED.  Can
>> you include one complete row from the Document Status report for one of
>> the missing documents?
>>
>
> For 
> "http://www.ibsen.uio.no/**sakprosa.xhtml<http://www.ibsen.uio.no/sakprosa.xhtml>
> ":
> Job: Ibsen
>
> State: Out of scope
> Status: Hopcount exceeded
> Scheduled: 01-01-1970 01:00:00.000
> Scheduled action: Process
> Retry count: N/A
> Retry limit: N/A
>
>
>  When you added documents to the seed list, what did the Simple History
>> say when they were fetched?  If they don't appear in the simple history,
>> they SHOULD have nevertheless appeared in the log, with an explanation
>> of why they were excluded, provided you have connector debugging enabled.
>>
>
> OK, here is the seed list:
> http://www.ibsen.uio.no/
>
> http://www.ibsen.uio.no/**skuespill.xhtml<http://www.ibsen.uio.no/skuespill.xhtml>
> http://www.ibsen.uio.no/dikt.**xhtml <http://www.ibsen.uio.no/dikt.xhtml>
> http://www.ibsen.uio.no/brev.**xhtml <http://www.ibsen.uio.no/brev.xhtml>
> http://www.ibsen.uio.no/**sakprosa.xhtml<http://www.ibsen.uio.no/sakprosa.xhtml>
> http://www.ibsen.uio.no/varia.**xhtml<http://www.ibsen.uio.no/varia.xhtml>
> http://www.ibsen.uio.no/**undervisningsressurser.xhtml<http://www.ibsen.uio.no/undervisningsressurser.xhtml>
>
> Here is the results from simple history:
> 08-12-2013 16:46:26.536         job end         1368534065016(Ibsen)
>                 0       1
> 08-12-2013 16:46:09.927         document ingest (Solr)
> http://www.ibsen.uio.no/**forside.xhtml<http://www.ibsen.uio.no/forside.xhtml>
>         OK      11897   178
> 08-12-2013 16:46:09.751         fetch   http://www.ibsen.uio.no/**
> forside.xhtml <http://www.ibsen.uio.no/forside.xhtml>
>         200     11897   17
> 08-12-2013 16:44:48.829         fetch   http://www.ibsen.uio.no/
>         302     0       79484
> 08-12-2013 16:44:48.727         robots parse    www.ibsen.uio.no:80
>
>         HTML    0       2       Robots file contained HTML, skipped
> 08-12-2013 16:44:46.574         job start       1368534065016(Ibsen)
>                 0       1
>         1
>
> HttpClient log:
> http://folk.uio.no/erlendfg/**manifoldcf/manifoldcf.log<http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log>
>
> Erlend
>
>

Reply via email to