Yes i'm using Trunk.
I think I found my pb. Actually it does work perfectly with one thread
per host but
if you set 2 threads per host, it doesn't wait crawlDelay. I've
configured my Nutch
to use 2 thread per hosts, that's why i had this issue.
In the code I can find
nextFetchTime.set(endTime + (maxThreads > 1 ? minCrawlDelay : crawlDelay));
and
this.minCrawlDelay = (long) (conf.getFloat("fetcher.server.min.delay",
0.0f) * 1000);
But fetcher.server.min.delay is not define in nutch-default.xml. So
minCrawlDelay =
0 seconds. It keep crawling without waiting.
However I'm wondering why do we have 2 delay ( minCrawlDelay and
crawlDelay ) and
why minCrawlDelay is set to 0 ? is it a bug ?
Could you please help me to understand ?
Thanks
EJ
> On 9/11/07, Emmanuel <[EMAIL PROTECTED] <javascript:void(0)>> wrote:
>> yes, i have this pb during my fetching. i tried 3 times and each times
>> the
>> process doesn't wait 5s as defined in nutch-default.xml.
>> Don't you have the pb?
>
> Are you using trunk? Trunk should not have this problem.
>
>>
>> > Emmanuel wrote:
>> >> I decided to use Fetcher2 instead of Fetcher and i noticed that
>> >> Fetcher2 doesn't act
>> >> on a polite way. I mean it doesn't wait fetcher.server.delay before
>> >> doing another
>> >> request on the same server.
>> >>
>> >> In Fetcher2 (on the last version of trunk), someone has defined this
>> >> option:
>> >> // set non-blocking & no-robots mode for HTTP protocol plugins.
>> >> getConf().setBoolean(Protocol.CHECK_BLOCKING, false);
>> >> getConf().setBoolean(Protocol.CHECK_ROBOTS, false);
>> >>
>> >> In this case, the protocol HTTP doesn't wait crawlDelay defore doing
>> >> another request.
>> >> May I know exactly why ?
>> >> Is it normal or a bug ?
>> >>
>> >
>> > Have you actually observed this wrong behavior during fetching?
>> Fetcher2
>> > performs blocking in a different way than Fetcher - it controls the
>> > blocking itself, instead of delegating it to the protocol plugin.
>> These
>> > two properties are set to false on purpose.
>> >
>> >
>> > --
>> > Best regards,
>> > Andrzej Bialecki <><
>> > ___. ___ ___ ___ _ _ __________________________________
>> > [__ || __|__/|__||\/| Information Retrieval, Semantic Web
>> > ___|||__|| \| || | Embedded Unix, System Integration
>> > http://www.sigram.com Contact: info at sigram dot com
>> >
>> >
>>
>
>
> --
> Doğacan Güney
>