I have implemented a down and dirty Global Locking:
I am currently testing it but I would like to get other people idea on
this:
I used RMI for this purpose:
A RMI server which implements two methods {
boolean lock(String urlString);
void unlock(String urlString);
}
the server holds a map where key is an Integer(host hash) the
val is a very simplistic class:
public class LockObj {
private int hash;
private long start;
private long timeout;
private int max_locks;
private int locks = 0;
private Object sync_obj = new Object();
public LockObj(int hash, long timeout, int max_locks) {
this.hash = hash;
this.timeout = timeout;
start = new Date().getTime();
this.max_locks = max_locks;
}
public synchronized boolean lock() {
boolean ret = false;
if (locks+1 < max_locks) {
synchronized(sync_obj) {
locks++;
}
ret = true;
}
return ret;
}
public synchronized void unlock() {
if (locks > 0) {
synchronized(sync_obj) {
locks--;
}
}
}
public int locks() {
return locks;
}
// convert the host part of a url to hash
// if url exception. use the string input for hash
public static int make_hash(String urlString) {
URL url = null;
try {
url = new URL(urlString);
} catch (MalformedURLException e) {
}
return (url==null ? urlString : url.getHost()).hashCode();
}
// check if this object timeout has reached.
// later implement a listener event
public boolean timeout_reached() {
long current = new Date().getTime();
return (current - start) > timeout;
}
// free all
public void unlock_all() {
synchronized(sync_obj) {
while (locks != 0)
locks--;
}
}
public int hash() {
return hash;
}
}
not the prettiest thing but just finished the first barrier... it
worked!!!
I changed FetcherThread constructor to create an instance of
SyncManager.
And in also in the run method I try to get a lock on the host. If not
successful I add the url into a ListArray for a later
processing...
I also changed generator to put each url into a separate array so all
fetchlists are even.
Would appreciate your comments and any way to improve.
The RMI is a little cumbersome but hay... for now it works for 5 task
trackers without a problem (so it seems) :)
Gal
On Wed, 2006-02-15 at 14:55 -0800, Doug Cutting wrote:
> Andrzej Bialecki wrote:
> > (FYI: if you wonder how it was working before, the trick was to generate
> > just 1 split for the fetch job, which then lead to just one task being
> > created for any input fetchlist.
>
> I don't think that's right. The generator uses setNumReduceTasks() to
> the desired number of fetch tasks, to control how many host-disjoint
> fetchlists are generated. Then the fetcher does not permit input files
> to be split, so that fetch tasks remain host-disjoint. So lots of
> splits can be generated, by default one per mapred.map.tasks, permitting
> lots of parallel fetching.
>
> This should still work. If it does not, I'd be interested to hear more
> details.
>
> Doug
>