Hi Erlend, You have wire logging (httpclient) enabled, which is useful for debugging fetch issues, but you do not have connector debugging on. To turn it on, add this to properties.xml:
<property name="org.apache.manifoldcf.connectors" value="DEBUG"/> thanks, Karl On Mon, Aug 12, 2013 at 10:53 AM, Erlend Garåsen <[email protected]>wrote: > On 8/12/13 4:29 PM, Karl Wright wrote: > >> Hi Erlend, >> >> The Document Status report shows these documents because they are still >> in the queue. The reasons for this could be several. Documents that >> exceed the hopcount by 1 level are allowed to remain in the queue for >> bookkeeping purposes. "scheduled date" as given only meaningful if the >> document is in an active state; my guess is that these documents are not >> in fact in that state, but rather in the state HOPCOUNT_EXCEEDED. Can >> you include one complete row from the Document Status report for one of >> the missing documents? >> > > For > "http://www.ibsen.uio.no/**sakprosa.xhtml<http://www.ibsen.uio.no/sakprosa.xhtml> > ": > Job: Ibsen > > State: Out of scope > Status: Hopcount exceeded > Scheduled: 01-01-1970 01:00:00.000 > Scheduled action: Process > Retry count: N/A > Retry limit: N/A > > > When you added documents to the seed list, what did the Simple History >> say when they were fetched? If they don't appear in the simple history, >> they SHOULD have nevertheless appeared in the log, with an explanation >> of why they were excluded, provided you have connector debugging enabled. >> > > OK, here is the seed list: > http://www.ibsen.uio.no/ > > http://www.ibsen.uio.no/**skuespill.xhtml<http://www.ibsen.uio.no/skuespill.xhtml> > http://www.ibsen.uio.no/dikt.**xhtml <http://www.ibsen.uio.no/dikt.xhtml> > http://www.ibsen.uio.no/brev.**xhtml <http://www.ibsen.uio.no/brev.xhtml> > http://www.ibsen.uio.no/**sakprosa.xhtml<http://www.ibsen.uio.no/sakprosa.xhtml> > http://www.ibsen.uio.no/varia.**xhtml<http://www.ibsen.uio.no/varia.xhtml> > http://www.ibsen.uio.no/**undervisningsressurser.xhtml<http://www.ibsen.uio.no/undervisningsressurser.xhtml> > > Here is the results from simple history: > 08-12-2013 16:46:26.536 job end 1368534065016(Ibsen) > 0 1 > 08-12-2013 16:46:09.927 document ingest (Solr) > http://www.ibsen.uio.no/**forside.xhtml<http://www.ibsen.uio.no/forside.xhtml> > OK 11897 178 > 08-12-2013 16:46:09.751 fetch http://www.ibsen.uio.no/** > forside.xhtml <http://www.ibsen.uio.no/forside.xhtml> > 200 11897 17 > 08-12-2013 16:44:48.829 fetch http://www.ibsen.uio.no/ > 302 0 79484 > 08-12-2013 16:44:48.727 robots parse www.ibsen.uio.no:80 > > HTML 0 2 Robots file contained HTML, skipped > 08-12-2013 16:44:46.574 job start 1368534065016(Ibsen) > 0 1 > 1 > > HttpClient log: > http://folk.uio.no/erlendfg/**manifoldcf/manifoldcf.log<http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log> > > Erlend > >
