Hi Florent

Thanks for the inquery and reply. I did some more tests based on your
suggestion.
Using the old protocol-http the problem is solved for single machine. But
when I have datanodes running on two other machines the problem still exist
but the number of unfetched pages is less than before. These are my tests

Injected URL: 80000
only one machine is datanode: 70000 fecthed pages
map tasks: 3
reduce tasks: 3
threads: 250

Injected URL: 80000
3 machines are datanode. All machines are partipated in the fetching by
looking at the task tracker logs on three machines:  20000 fetched pages
 map tasks: 12
reduce tasks: 6
threads: 250

Injected URL : 5000
 3 machines are datanode. All machines are partipated in the fetching by
looking at the task tracker logs on three machines:  1200 fetched pages
map tasks: 12
reduce tasks: 6
threads: 250


Injected URL : 1000
 3 machines are datanode. All machines are partipated in the fetching by
looking at the task tracker logs on three machines:  240 fetched pages

 Injected URL : 1000
 only one machine is datanode: 800 fecthed pages
 map tasks: 3
reduce tasks: 3
threads: 250

I also commented line 211 of Generator.java, but it didn't change the
situation.

I'll try to do some more testings.

Thanks, Mike

On 1/19/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Florent Gluck wrote:
> > I then decided to switch to using the old http protocol plugin:
> > protocol-http (in nutch-default.xml) instead of protocol-httpclient
> > With the old protocol I got 50000 as expected.
>
> There have been a number of complaints about unreliable fetching with
> protocol-httpclient, so I've switched the default back to protocol-http.
>
> Doug
>

Reply via email to