Re: What's the basic idea of pseudo-distributed Hadoop ?

Harsh J Fri, 14 Sep 2012 00:25:32 -0700

Hi Jason,

I think you're confusing the standalone mode with a pseudo-distributed
mode. The former is a limited mode of MR where no daemons need to be
deployed and the tasks run in a single JVM (via threads).

A pseudo distributed cluster is a cluster where all daemons are
running on one node itself. Hence, not "distributed" in the sense of
multi-nodes (no use of an network gear) but works in the same way
between nodes (RPC, etc.) as a fully-distributed one.

If an MR program works fine in a pseudo-distributed mode, it "should"
work (no guarantee) fine in a fully-distributed mode iff all nodes
have the same arch/OS, same JVM, and job-specific configurations. This
is because tasks execute on various nodes and may be affected by the
node's behavior or setup that is different from others - and thats
something you'd have to detect/know about if it exhibits failures more
than others.

On Fri, Sep 14, 2012 at 11:58 AM, Jason Yang <[email protected]> wrote:
> Hey, Kai
>
> Thanks for you reply.
>
> I was wondering what's difference btw the pseudo-distributed and
> fully-distributed hadoop, except the maximum number of map/reduce.
>
> And if a MR program works fine in pseudo-distributed cluster, will it work
> exactly fine in the fully-distributed cluster ?
>
>
> 2012/9/14 Kai Voigt <[email protected]>
>>
>> e default setting is that a tasktracker can run up to two map and reduce
>> tasks in parallel (mapred.tasktracker.map.tasks.maximum and
>> mapred.tasktracker.reduce.tasks.maximum), so you will actually see some
>> concurrency on your one machine.
>
>
>
>
> --
> YANG, Lin
>

-- 
Harsh J

Re: What's the basic idea of pseudo-distributed Hadoop ?

Reply via email to