Hi Mike,

I finally got everything working properly!
What I did was to switch to /protocol-http/ and move the following from
/nutch-site.xml/ to /mapred-default.xml/:

/<property>
  <name>mapred.map.tasks</name>
  <value>100</value>
  <description>The default number of map tasks per job.  Typically set
  to a prime several times greater than number of available hosts.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>40</value>
  <description>The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local".
  </description>
</property>/

I then injected 100'000 urls and grepped the logs on my 4 slaves to see
if the sum of all the fetched urls adds up to 100'000.  It did :)
There was finally no need to comment out line 211 of /Generator.java.

/Hope it helps,/
--/Flo

Mike Smith wrote:

>Hi Florent
>
>Thanks for the inquery and reply. I did some more tests based on your
>suggestion.
>Using the old protocol-http the problem is solved for single machine. But
>when I have datanodes running on two other machines the problem still exist
>but the number of unfetched pages is less than before. These are my tests
>
>Injected URL: 80000
>only one machine is datanode: 70000 fecthed pages
>map tasks: 3
>reduce tasks: 3
>threads: 250
>
>Injected URL: 80000
>3 machines are datanode. All machines are partipated in the fetching by
>looking at the task tracker logs on three machines:  20000 fetched pages
> map tasks: 12
>reduce tasks: 6
>threads: 250
>
>Injected URL : 5000
> 3 machines are datanode. All machines are partipated in the fetching by
>looking at the task tracker logs on three machines:  1200 fetched pages
>map tasks: 12
>reduce tasks: 6
>threads: 250
>
>
>Injected URL : 1000
> 3 machines are datanode. All machines are partipated in the fetching by
>looking at the task tracker logs on three machines:  240 fetched pages
>
> Injected URL : 1000
> only one machine is datanode: 800 fecthed pages
> map tasks: 3
>reduce tasks: 3
>threads: 250
>
>I also commented line 211 of Generator.java, but it didn't change the
>situation.
>
>I'll try to do some more testings.
>
>Thanks, Mike
>
>On 1/19/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
>  
>
>>Florent Gluck wrote:
>>    
>>
>>>I then decided to switch to using the old http protocol plugin:
>>>protocol-http (in nutch-default.xml) instead of protocol-httpclient
>>>With the old protocol I got 50000 as expected.
>>>      
>>>
>>There have been a number of complaints about unreliable fetching with
>>protocol-httpclient, so I've switched the default back to protocol-http.
>>
>>Doug
>>
>>    
>>
>
>  
>

Reply via email to