How does one set up pseudo distributed with a local filesystem? are you saying fs.default.name can be left as file:/// instead of being set to hdfs://? Do you then set mapred.job.tracker to file:/// as well?
Thanks, Steve Cohen On Tue, Sep 28, 2010 at 10:26 AM, Andrzej Bialecki <[email protected]> wrote: > On 2010-09-28 14:27, Markus Jelsma wrote: > >> Thanks Andrzej, >> >> I will make an effort in getting it to run on Hadoop but i'd rather go for >> a >> fully distributed set up (although with only a single node for now) so i >> can >> add more machines later. >> > > That's what I meant, sorry for using jargon - pseudo-distributed is a > "fully distributed Hadoop that runs on a single node". Please note that you > don't have to use HDFS then - all nodes :) have direct access to the same > local file system. > > > Will the HadoopNutch tutorial on the wiki allow me to >> set up for a cluster on a single node? Also, will it then still make use >> of >> multiple cores? >> > > Yes, because there will be multiple tasks running in parallel, in multiple > processes, which will be likely run on different cores. > > As I said, the main big difference between using LocalJobTracker and a real > JobTracker is that with LocalJobTracker: > > * all map tasks are run sequentially, there is no parallelism. > * there is always one reduce task - if your dataset is large then this > single task will have to handle the sorting of the whole dataset, which may > take disproportionately longer than if the data were split among multiple > reduce tasks. > > Whereas with the JobTracker/TaskTracker, even when running on a single > node: > > * tasks are run in separate processes and execute in parallel > * there are many reduce tasks (as many as you configured), which handle > portions of the output dataset, and which execute also in parallel. > > So even on a single node a pseudo-distributed setup should be faster than > running in local mode. > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > >

