Doug,
the crawler was just an example, however you are right and I agree with the KISS development concept.
I don't think it make much sense to have a 'grid' with 100 boxes and all boxes crawl or all boxes have one segment.
May the idea in you blog with 'Dynamization and Lucene' can play a interesting role.
Anyway I would love to hear more about the plans how nutch will implement the map reduce concept. Especially how the job assignment will work would be interesting to discuss.
By the way - in case nutch starts the map & reduce stuff witch will be a dramatic change of the architecture, I would love to do the jmx stuff within this step.
I tried several times to write adapter for the existing code but I failed many times because of the code structure.
I would love to commit so people think it would be interesting for nutch to run the map & reduce on top jmx?
Stefan
Am 17.01.2005 um 18:11 schrieb Doug Cutting:
Stefan Groschupf wrote:the google file system support multiple clients writing to one file ( or may chunk).
In case we porting nutch functionality to map and reduce this would be very useful as well.
For example a set of crawlers writing to one 'segment file'.
Does the actually implementation of ndfs support this functionality as well?
NDFS does not currently implement multiple writers to the same file, nor are there any plans to implement that at present.
I don't think it would make crawling much simpler. Long-term, most database operations will be map/reduce-based, including fetchlist generation and db update. Map/reduce naturally accepts sets of files as input and produces sets of files as output. So having each fetcher agent read and write separate segment files works well with map/reduce.
In the GFS paper I think the multiple writers feature is used to implement queues. Perhaps someday we will discover a critical need for queues of this sort, but, right now we want to spare the complexity it would add to the file system.
Doug
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
