Thanks, I will tomorrow and report thereafter. I hope we will find a simple explanation. :)
E On 8/12/13 5:07 PM, Karl Wright wrote:
Hi Erlend, You have wire logging (httpclient) enabled, which is useful for debugging fetch issues, but you do not have connector debugging on. To turn it on, add this to properties.xml: <property name="org.apache.manifoldcf.connectors" value="DEBUG"/> thanks, Karl On Mon, Aug 12, 2013 at 10:53 AM, Erlend Garåsen <[email protected] <mailto:[email protected]>> wrote: On 8/12/13 4:29 PM, Karl Wright wrote: Hi Erlend, The Document Status report shows these documents because they are still in the queue. The reasons for this could be several. Documents that exceed the hopcount by 1 level are allowed to remain in the queue for bookkeeping purposes. "scheduled date" as given only meaningful if the document is in an active state; my guess is that these documents are not in fact in that state, but rather in the state HOPCOUNT_EXCEEDED. Can you include one complete row from the Document Status report for one of the missing documents? For "http://www.ibsen.uio.no/__sakprosa.xhtml <http://www.ibsen.uio.no/sakprosa.xhtml>": Job: Ibsen State: Out of scope Status: Hopcount exceeded Scheduled: 01-01-1970 01:00:00.000 Scheduled action: Process Retry count: N/A Retry limit: N/A When you added documents to the seed list, what did the Simple History say when they were fetched? If they don't appear in the simple history, they SHOULD have nevertheless appeared in the log, with an explanation of why they were excluded, provided you have connector debugging enabled. OK, here is the seed list: http://www.ibsen.uio.no/ http://www.ibsen.uio.no/__skuespill.xhtml <http://www.ibsen.uio.no/skuespill.xhtml> http://www.ibsen.uio.no/dikt.__xhtml <http://www.ibsen.uio.no/dikt.xhtml> http://www.ibsen.uio.no/brev.__xhtml <http://www.ibsen.uio.no/brev.xhtml> http://www.ibsen.uio.no/__sakprosa.xhtml <http://www.ibsen.uio.no/sakprosa.xhtml> http://www.ibsen.uio.no/varia.__xhtml <http://www.ibsen.uio.no/varia.xhtml> http://www.ibsen.uio.no/__undervisningsressurser.xhtml <http://www.ibsen.uio.no/undervisningsressurser.xhtml> Here is the results from simple history: 08-12-2013 16:46:26.536 job end 1368534065016(Ibsen) 0 1 08-12-2013 16:46:09.927 document ingest (Solr) http://www.ibsen.uio.no/__forside.xhtml <http://www.ibsen.uio.no/forside.xhtml> OK 11897 178 08-12-2013 16:46:09.751 fetch http://www.ibsen.uio.no/__forside.xhtml <http://www.ibsen.uio.no/forside.xhtml> 200 11897 17 08-12-2013 16:44:48.829 fetch http://www.ibsen.uio.no/ 302 0 79484 08-12-2013 16:44:48.727 robots parse www.ibsen.uio.no:80 <http://www.ibsen.uio.no:80> HTML 0 2 Robots file contained HTML, skipped 08-12-2013 16:44:46.574 job start 1368534065016(Ibsen) 0 1 1 HttpClient log: http://folk.uio.no/erlendfg/__manifoldcf/manifoldcf.log <http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log> Erlend
