Is there some statistics available to monitor which percentage of the pairs remains in memory, and which percentage was written to disk?

Or which are these exceptional cases that you mention?


Hadoop goes to some lengths to make sure that things can stay in memory as
much as possible.  There are still cases, however, where intermediate
results are normally written to disk. That means that implementors will have those time scales in their head as they do things which will inevitably make the trade-offs somewhat poor compared to a system that never envisions
intermediate data being written to disk.

But other than guessing like this, I couldn't actually say how it would turn
out except that for very short jobs, moving jar files around and other
startup costs can be the dominant cost.

On Sun, Jun 1, 2008 at 5:05 AM, Martin Jaggi <[EMAIL PROTECTED]> wrote:


So in the case that all intermediate pairs fit into the RAM of the cluster, does the InMemoryFileSystem already allow the intermediate phase to be done without much disk access? Or what would be the current bottleneck in Hadoop
in this scenario (huge computational load, not so much data in/out)
according to your opinion?

Reply via email to