well, at the moment it solve the problem I mentioned yesterday where all
tasktrackers will access the same site with hadoop. it seems that the
use of job.setBoolean("mapred.speculative.execution", false); didn't
help and I'm not sure why.

However, though it is one more software it removes the need for special
treatment for fetcher, i.e. special fetch lists built by the generator.
So now fetcher/tasktracker suppose to access politely to hosts but still
its list contains various hosts. Sometimes I noticed that generator
created a fetchlist where (only 2 hosts in the seed) were put in the
same fetchlist which made only one tasktracker work instead of two.

I'm sorry if It sound a little confusing :) or unreasonable... :)

Gal



On Thu, 2006-02-16 at 13:47 -0800, Doug Cutting wrote:
> Gal Nitzan wrote:
> > I have implemented a down and dirty Global Locking:
> >  [ ... ]
> > 
> > I changed FetcherThread constructor to create an instance of
> > SyncManager.
> > 
> > And in also in the run method I try to get a lock on the host. If not
> > successful I add the url into a ListArray<key,datum> for a later
> > processing...
> > 
> > I also changed generator to put each url into a separate array so all
> > fetchlists are even.
> 
> What problem does this fix?
> 
> Doug
> 




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to