Hi Mike, I finally got everything working properly! What I did was to switch to /protocol-http/ and move the following from /nutch-site.xml/ to /mapred-default.xml/:
/<property> <name>mapred.map.tasks</name> <value>100</value> <description>The default number of map tasks per job. Typically set to a prime several times greater than number of available hosts. Ignored when mapred.job.tracker is "local". </description> </property> <property> <name>mapred.reduce.tasks</name> <value>40</value> <description>The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". </description> </property>/ I then injected 100'000 urls and grepped the logs on my 4 slaves to see if the sum of all the fetched urls adds up to 100'000. It did :) There was finally no need to comment out line 211 of /Generator.java. /Hope it helps,/ --/Flo Mike Smith wrote: >Hi Florent > >Thanks for the inquery and reply. I did some more tests based on your >suggestion. >Using the old protocol-http the problem is solved for single machine. But >when I have datanodes running on two other machines the problem still exist >but the number of unfetched pages is less than before. These are my tests > >Injected URL: 80000 >only one machine is datanode: 70000 fecthed pages >map tasks: 3 >reduce tasks: 3 >threads: 250 > >Injected URL: 80000 >3 machines are datanode. All machines are partipated in the fetching by >looking at the task tracker logs on three machines: 20000 fetched pages > map tasks: 12 >reduce tasks: 6 >threads: 250 > >Injected URL : 5000 > 3 machines are datanode. All machines are partipated in the fetching by >looking at the task tracker logs on three machines: 1200 fetched pages >map tasks: 12 >reduce tasks: 6 >threads: 250 > > >Injected URL : 1000 > 3 machines are datanode. All machines are partipated in the fetching by >looking at the task tracker logs on three machines: 240 fetched pages > > Injected URL : 1000 > only one machine is datanode: 800 fecthed pages > map tasks: 3 >reduce tasks: 3 >threads: 250 > >I also commented line 211 of Generator.java, but it didn't change the >situation. > >I'll try to do some more testings. > >Thanks, Mike > >On 1/19/06, Doug Cutting <[EMAIL PROTECTED]> wrote: > > >>Florent Gluck wrote: >> >> >>>I then decided to switch to using the old http protocol plugin: >>>protocol-http (in nutch-default.xml) instead of protocol-httpclient >>>With the old protocol I got 50000 as expected. >>> >>> >>There have been a number of complaints about unreliable fetching with >>protocol-httpclient, so I've switched the default back to protocol-http. >> >>Doug >> >> >> > > >
