All right, I got it. Thanks for all of you.
2012/9/14 Bertrand Dechoux <[email protected]> > The only difference between pseudo-distributed and fully distributed would > be scale. You could say that code that runs fine on the former, runs fine > too on the latter. But it does not necessary mean that the performance will > scale the same way (ie if you keep a list of elements in memory, at bigger > scale you could receive OOME). > > Of course, like it has been implied in previous answers, you can't say the > same with standalone. With this mode, you could use a global mutable static > state thinking it's fine without caring about distribution between the > nodes. In that case, the same code launched on pseudo-distributed will fail > to replicate the same results. > > Regards > > Bertrand > > > On Fri, Sep 14, 2012 at 9:24 AM, Harsh J <[email protected]> wrote: > >> Hi Jason, >> >> I think you're confusing the standalone mode with a pseudo-distributed >> mode. The former is a limited mode of MR where no daemons need to be >> deployed and the tasks run in a single JVM (via threads). >> >> A pseudo distributed cluster is a cluster where all daemons are >> running on one node itself. Hence, not "distributed" in the sense of >> multi-nodes (no use of an network gear) but works in the same way >> between nodes (RPC, etc.) as a fully-distributed one. >> >> If an MR program works fine in a pseudo-distributed mode, it "should" >> work (no guarantee) fine in a fully-distributed mode iff all nodes >> have the same arch/OS, same JVM, and job-specific configurations. This >> is because tasks execute on various nodes and may be affected by the >> node's behavior or setup that is different from others - and thats >> something you'd have to detect/know about if it exhibits failures more >> than others. >> >> On Fri, Sep 14, 2012 at 11:58 AM, Jason Yang <[email protected]> >> wrote: >> > Hey, Kai >> > >> > Thanks for you reply. >> > >> > I was wondering what's difference btw the pseudo-distributed and >> > fully-distributed hadoop, except the maximum number of map/reduce. >> > >> > And if a MR program works fine in pseudo-distributed cluster, will it >> work >> > exactly fine in the fully-distributed cluster ? >> > >> > >> > 2012/9/14 Kai Voigt <[email protected]> >> >> >> >> e default setting is that a tasktracker can run up to two map and >> reduce >> >> tasks in parallel (mapred.tasktracker.map.tasks.maximum and >> >> mapred.tasktracker.reduce.tasks.maximum), so you will actually see some >> >> concurrency on your one machine. >> > >> > >> > >> > >> > -- >> > YANG, Lin >> > >> >> >> >> -- >> Harsh J >> > > > > -- > Bertrand Dechoux > -- YANG, Lin
