Byron Miller wrote:
What is the status of map reduce?  i just got finished reading your paper
and all of the threads and i'm drooling over the notion of such a system :)

We have an initial implementation of MapReduce that works, but has not been yet used heavily, and thus probably needs improvement. Next month I plan to start porting all of Nutch's algorithm's to sit on MapReduce, as outlined in:


http://www.mail-archive.com/[email protected]/msg03754.html

In the first iteration I will probably not implement full link analysis, only inlink counts and text. Nor will implement continuous fetching: one will still alternately fetch and update the page db. But updating the pagedb should be much more scalable. Also, no link db will be maintained while fetching. I hope to have this working by June or so and start trying to use it to build billion-page scale indexes for the Internet Archive. These plans are subject to change.

Doug

Reply via email to