I have implemented a down and dirty Global Locking:

I am currently testing it but I would like to get other people idea on
this:

I used RMI for this purpose:

A RMI server which implements two methods {
boolean lock(String urlString);
void unlock(String urlString);
}

the server holds a map<key,val> where key is an Integer(host hash) the
val is a very simplistic class:

public class LockObj {
  private int hash;
  private long start;
  private long timeout;
  private int max_locks;
  private int locks = 0;
  private Object sync_obj = new Object();

  public LockObj(int hash, long timeout, int max_locks) {
    this.hash = hash;
    this.timeout = timeout;
    start = new Date().getTime();
    this.max_locks = max_locks;
  }

  public synchronized boolean lock() {
    boolean ret = false;

    if (locks+1 < max_locks) {
      synchronized(sync_obj) {
        locks++;
      }
      ret = true;
    }
    return ret;
  }

  public synchronized void unlock() {
    if (locks > 0) {
      synchronized(sync_obj) {
        locks--;
      }
    }
  }

  public int locks() {
    return locks;
  }

  // convert the host part of a url to hash
  // if url exception. use the string input for hash
  public static int make_hash(String urlString) {
    URL url = null;
    try {
      url = new URL(urlString);
    } catch (MalformedURLException e) {
    }

    return (url==null ? urlString : url.getHost()).hashCode();
  }

  // check if this object timeout has reached.
  // later implement a listener event
  public boolean timeout_reached() {
    long current = new Date().getTime();

    return (current - start) > timeout;
  }

  // free all
  public void unlock_all() {
    synchronized(sync_obj) {
      while (locks != 0)
        locks--;
    }
  }

  public int hash() {
    return hash;
  }
}

not the prettiest thing but just finished the first barrier... it
worked!!!


I changed FetcherThread constructor to create an instance of
SyncManager.

And in also in the run method I try to get a lock on the host. If not
successful I add the url into a ListArray<key,datum> for a later
processing...

I also changed generator to put each url into a separate array so all
fetchlists are even.

Would appreciate your comments and any way to improve.

The RMI is a little cumbersome but hay... for now it works for 5 task
trackers without a problem (so it seems) :)


Gal




On Wed, 2006-02-15 at 14:55 -0800, Doug Cutting wrote:
> Andrzej Bialecki wrote:
> > (FYI: if you wonder how it was working before, the trick was to generate 
> > just 1 split for the fetch job, which then lead to just one task being 
> > created for any input fetchlist.
> 
> I don't think that's right.  The generator uses setNumReduceTasks() to 
> the desired number of fetch tasks, to control how many host-disjoint 
> fetchlists are generated.  Then the fetcher does not permit input files 
> to be split, so that fetch tasks remain host-disjoint.  So lots of 
> splits can be generated, by default one per mapred.map.tasks, permitting 
> lots of parallel fetching.
> 
> This should still work.  If it does not, I'd be interested to hear more 
> details.
> 
> Doug
> 




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to