Hi Jason, I think you're confusing the standalone mode with a pseudo-distributed mode. The former is a limited mode of MR where no daemons need to be deployed and the tasks run in a single JVM (via threads).
A pseudo distributed cluster is a cluster where all daemons are running on one node itself. Hence, not "distributed" in the sense of multi-nodes (no use of an network gear) but works in the same way between nodes (RPC, etc.) as a fully-distributed one. If an MR program works fine in a pseudo-distributed mode, it "should" work (no guarantee) fine in a fully-distributed mode iff all nodes have the same arch/OS, same JVM, and job-specific configurations. This is because tasks execute on various nodes and may be affected by the node's behavior or setup that is different from others - and thats something you'd have to detect/know about if it exhibits failures more than others. On Fri, Sep 14, 2012 at 11:58 AM, Jason Yang <[email protected]> wrote: > Hey, Kai > > Thanks for you reply. > > I was wondering what's difference btw the pseudo-distributed and > fully-distributed hadoop, except the maximum number of map/reduce. > > And if a MR program works fine in pseudo-distributed cluster, will it work > exactly fine in the fully-distributed cluster ? > > > 2012/9/14 Kai Voigt <[email protected]> >> >> e default setting is that a tasktracker can run up to two map and reduce >> tasks in parallel (mapred.tasktracker.map.tasks.maximum and >> mapred.tasktracker.reduce.tasks.maximum), so you will actually see some >> concurrency on your one machine. > > > > > -- > YANG, Lin > -- Harsh J
