Hi Florent Thanks for the inquery and reply. I did some more tests based on your suggestion. Using the old protocol-http the problem is solved for single machine. But when I have datanodes running on two other machines the problem still exist but the number of unfetched pages is less than before. These are my tests
Injected URL: 80000 only one machine is datanode: 70000 fecthed pages map tasks: 3 reduce tasks: 3 threads: 250 Injected URL: 80000 3 machines are datanode. All machines are partipated in the fetching by looking at the task tracker logs on three machines: 20000 fetched pages map tasks: 12 reduce tasks: 6 threads: 250 Injected URL : 5000 3 machines are datanode. All machines are partipated in the fetching by looking at the task tracker logs on three machines: 1200 fetched pages map tasks: 12 reduce tasks: 6 threads: 250 Injected URL : 1000 3 machines are datanode. All machines are partipated in the fetching by looking at the task tracker logs on three machines: 240 fetched pages Injected URL : 1000 only one machine is datanode: 800 fecthed pages map tasks: 3 reduce tasks: 3 threads: 250 I also commented line 211 of Generator.java, but it didn't change the situation. I'll try to do some more testings. Thanks, Mike On 1/19/06, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Florent Gluck wrote: > > I then decided to switch to using the old http protocol plugin: > > protocol-http (in nutch-default.xml) instead of protocol-httpclient > > With the old protocol I got 50000 as expected. > > There have been a number of complaints about unreliable fetching with > protocol-httpclient, so I've switched the default back to protocol-http. > > Doug >
