Re: Hop count problem

Erlend Garåsen Mon, 12 Aug 2013 05:17:25 -0700

On 8/12/13 1:31 PM, Karl Wright wrote:

Based on your report that the test environment works OK, and the
production environment does not, I expect there is something like this
going on.  I know you attempted to fetch the intervening document from
your test environment, but it is conceivable that the production
environment is unable to get it.  You should see evidence of that in the
simple history, if so.

I have looked through the complete history regarding this host, and noneof the other documents have ever been fetched. The only thing I can seeis an illegal robots.txt file:

robots parse    www.ibsen.uio.no:80
        HTML    0       1       Robots file contained HTML, skipped

I don't think this robots file has stopped MCF from crawling the otherdocuments since I can see this entry in the our test environment aswell. I even tried to disable robots.txt checks, but the problems persist.

I forgot to mention that the hopcount mode is "Keep unreachabledocuments, forever"

So, if I understand you correctly, there is no point of hacking thedatabase since MCF will try to refetch unreachable documents anyway. Ican of course enable HttpClient logging and check whether MCF tries tofetch these resources at all.


Erlend

Re: Hop count problem

Reply via email to