RE: running map tasks in remote node

java8964 java8964 Thu, 22 Aug 2013 04:12:27 -0700

If you don't plan to use HDFS, what kind of sharing file system you are going 
to use between cluster? NFS?For what you want to do, even though it doesn't 
make too much sense, but you need to the first problem as the shared file 
system.
Second, if you want to process the files file by file, instead of block by 
block in HDFS, then you need to use the WholeFileInputFormat (google this how 
to write one). So you don't need a file to list all the files to be processed, 
just put them into one folder in the sharing file system, then send this folder 
to your MR job. In this way, as long as each node can access it through some 
file system URL, each file will be processed in each mapper.
Yong


Date: Wed, 21 Aug 2013 17:39:10 +0530
Subject: running map tasks in remote node
From: [email protected]
To: [email protected]

Hello, 
Here is the new bie question of the day. For one of my use cases, I want to use 
hadoop map reduce without HDFS. Here, I will have a text file containing a list 
of file names to process. Assume that I have 10 lines (10 files to process) in 
the input text file and I wish to generate 10 map tasks and execute them in 
parallel in 10 nodes. I started with basic tutorial on hadoop and could setup 
single node hadoop cluster and successfully tested wordcount code.
 Now, I took two machines A (master) and B (slave). I did the below 
configuration in these machines to setup a two node cluster.
 hdfs-site.xml
 <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put 
site-specific property overrides in this file. -->
<configuration><property>
          <name>dfs.replication</name>          <value>1</value>
</property><property>
  <name>dfs.name.dir</name>  <value>/tmp/hadoop-bala/dfs/name</value>
</property><property>
  <name>dfs.data.dir</name>  <value>/tmp/hadoop-bala/dfs/data</value>
</property><property>
     <name>mapred.job.tracker</name>    <value>A:9001</value>
</property> 
</configuration> mapred-site.xml
 <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<!-- Put site-specific property overrides in this file. --> 
<configuration><property>
            <name>mapred.job.tracker</name>            <value>A:9001</value>
</property><property>
          <name>mapreduce.tasktracker.map.tasks.maximum</name>           
<value>1</value>
</property></configuration>
 core-site.xml 
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. --><configuration>
         <property>                <name>fs.default.name</name>
                <value>hdfs://A:9000</value>        </property>
</configuration> 
 In A and B, I do have a file named ‘slaves’ with an entry ‘B’ in it and 
another file called ‘masters’ wherein an entry ‘A’ is there.
 I have kept my input file at A. I see the map method process the input file 
line by line but they are all processed in A. Ideally, I would expect those 
processing to take place in B.
 Can anyone highlight where I am going wrong?
  regardsrab

RE: running map tasks in remote node

Reply via email to