Yes i'm using Trunk.

I think I found my pb. Actually it does work perfectly with one thread
per host but
if you set 2 threads per host, it doesn't wait crawlDelay. I've
configured my Nutch
to use 2 thread per hosts, that's why i had this issue.

In the code I can find
nextFetchTime.set(endTime + (maxThreads > 1 ? minCrawlDelay : crawlDelay));
and
this.minCrawlDelay = (long) (conf.getFloat("fetcher.server.min.delay",
0.0f) * 1000);

But fetcher.server.min.delay is not define in nutch-default.xml. So
minCrawlDelay =
0 seconds. It keep crawling without waiting.

However I'm wondering why do we have 2 delay ( minCrawlDelay  and
crawlDelay ) and
why minCrawlDelay is set to 0 ? is it a bug ?

Could you please help me to understand ?

Thanks
EJ

> On 9/11/07, Emmanuel <[EMAIL PROTECTED] <javascript:void(0)>> wrote:
>> yes, i have this pb during my fetching. i tried 3 times and each times
>> the
>> process doesn't wait 5s as defined in nutch-default.xml.
>> Don't you have the pb?
>
> Are you using trunk? Trunk should not have this problem.
>
>>
>> > Emmanuel wrote:
>> >> I decided to use Fetcher2 instead of Fetcher and i noticed that
>> >> Fetcher2 doesn't act
>> >> on a polite way. I mean it doesn't wait fetcher.server.delay before
>> >> doing another
>> >> request on the same server.
>> >>
>> >> In Fetcher2 (on the last version of trunk), someone has defined this
>> >> option:
>> >>     // set non-blocking & no-robots mode for HTTP protocol plugins.
>> >>     getConf().setBoolean(Protocol.CHECK_BLOCKING, false);
>> >>     getConf().setBoolean(Protocol.CHECK_ROBOTS, false);
>> >>
>> >> In this case, the protocol HTTP doesn't wait crawlDelay defore doing
>> >> another request.
>> >> May I know exactly why ?
>> >> Is it normal or a bug ?
>> >>
>> >
>> > Have you actually observed this wrong behavior during fetching?
>> Fetcher2
>> >   performs blocking in a different way than Fetcher - it controls the
>> > blocking itself, instead of delegating it to the protocol plugin.
>> These
>> > two properties are set to false on purpose.
>> >
>> >
>> > --
>> > Best regards,
>> > Andrzej Bialecki     <><
>> >   ___. ___ ___ ___ _ _   __________________________________
>> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> > http://www.sigram.com  Contact: info at sigram dot com
>> >
>> >
>>
>
>
> --
> Doğacan Güney
>

Reply via email to