If I have written a WordCount.java job in this manner:
conf.setMapperClass(Map.class);
conf.setCombinerClass(Combine.class);
conf.setReducerClass(Reduce.class);
So, you can see that three classes are being used here. I have
packaged these classes into a jar file called wc
I was going through the tutorial here.
http://hadoop.apache.org/core/docs/current/cluster_setup.html
Certain things are not clear. I am asking them point-wise. I have a
setup of 4 linux machines. 1 name node, 1 job tracker and 2 slaves
(each is data node as well as task tracker).
1. Should I edi
Whenever I try to start the DFS, I get this error:
had...@namenode:~/hadoop-0.19.1$ bin/start-dfs.sh
starting namenode, logging to
/home/hadoop/hadoop-0.19.1/bin/../logs/hadoop-hadoop-namenode-hadoop-namenode.out
10.31.253.142: starting datanode, logging to
/home/hadoop/hadoop-0.19.1/bin/../logs/h
I have a namenode and job tracker on two different machines.
I see that a namenode tries to do an ssh log into itself (name node),
job tracker as well as all slave machines.
However, the job tracker tries to do an ssh log into the slave
machines only. Why this difference in behavior? Could someon
I have a few more questions on your answers. Please see them inline.
On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon wrote:
> On Sat, Apr 4, 2009 at 3:47 AM, Foss User wrote:
>>
>> 1. Should I edit conf/slaves on all nodes or only on name node? Do I
>> have to edit th
On Sat, Apr 4, 2009 at 6:46 PM, Foss User wrote:
> Whenever I try to start the DFS, I get this error:
>
> had...@namenode:~/hadoop-0.19.1$ bin/start-dfs.sh
> starting namenode, logging to
> /home/hadoop/hadoop-0.19.1/bin/../logs/hadoop-hadoop-namenode-hadoop-namenode.out
> 10.31
I am trying to learn Hadoop and a lot of questions come to my mind
when I try to learn it. So, I will be asking a few questions here from
time to time until I feel completely comfortable with it. Here are
some questions now:
1. Is it true that Hadoop should be installed on the same location on
all
I have a Hadoop cluster of 5 nodes: (1) Namenode (2) Job tracker (3)
First slave (4) Second Slave (5) Client from where I submit jobs
I brought system no. 4 down by running:
bin/hadoop-daemon.sh stop datanode
bin/hadoop-daemon.sh stop tasktracker
After this I tried running my word count job agai
On Sun, Apr 5, 2009 at 3:18 PM, Foss User wrote:
> I have a Hadoop cluster of 5 nodes: (1) Namenode (2) Job tracker (3)
> First slave (4) Second Slave (5) Client from where I submit jobs
>
> I brought system no. 4 down by running:
>
> bin/hadoop-daemon.sh stop datanode
> bin/
I created a Hadoop cluster. I created a folder in it called '/fossist'
and gave the ownership of that folder only to the user called
'fossist'. Only 'fossist' has write permissions over the folder called
'/fossist'.
However, I see that anyone can easily impersonate as fossist in the
following mann
I have a Linux machine where I do not run namenode or tasktracker but
I have hadoop installed. I use this machine to submit jobs to the
cluster. I see that the moment I put /etc/hosts entry for my-namenode,
I get the following error:
foss...@cave:~/mcr-wordcount$ hadoop jar dist/mcr-wordcount-0.1.
In the documentation I was reading that files are stored as file
splits in the HDFS. What is the size of each file split? Is it
configurable? If yes, how can I configure it?
Today I formatted the namenode while the namenode and jobtracker was
up. I found that I was still able to browse the file system using the
command: bin/hadoop dfs -lsr /
Then, I stopped the namenode and jobtracker and did a format again. I
started the namenode and jobtracker. I could still browse
Is it possible to sort the intermediate values for each key before
they pair reaches the reducer?
Also, is it possible to sort the final output pairs from
reducer before it is written into the HDFS?
1. Do the reducers of a job start only after all mappers have finished?
2. Say there are 10 slave nodes. Let us say one of the nodes is very
slow as compared to other nodes. So, while the mappers in the other 9
have finished in 2 minutes, the one on the slow one might take 20
minutes. Is Hadoop i
On Thu, May 7, 2009 at 12:44 AM, Todd Lipcon wrote:
> On Wed, May 6, 2009 at 11:40 AM, Foss User wrote:
>
>> Today I formatted the namenode while the namenode and jobtracker was
>> up. I found that I was still able to browse the file system using the
>> command: bin/hado
Thanks for your response. I got a few more questions regarding optimizations.
1. Does hadoop clients locally cache the data it last requested?
2. Is the meta data for file blocks on data node kept in the
underlying OS's file system on namenode or is it kept in RAM of the
name node?
3. If no mapp
Thanks for your response again. I could not understand a few things in
your reply. So, I want to clarify them. Please find my questions
inline.
On Thu, May 7, 2009 at 2:28 AM, Todd Lipcon wrote:
> On Wed, May 6, 2009 at 1:46 PM, Foss User wrote:
>> 2. Is the meta data for file block
I have two reducers running on two different machines. I ran the
example word count program with some of my own System.out.println()
statements to see what is going on.
There were 2 slaves each running datanode as well as tasktracker.
There was one namenode and one jobtracker. I know there is a ve
On Thu, May 7, 2009 at 8:51 PM, jason hadoop wrote:
> Most likely the 3rd mapper ran as a speculative execution, and it is
> possible that all of your keys hashed to a single partition. Also, if you
> don't specify the default is to run a single reduce task.
As I mentioned in my first mail, I tri
I have written a rack awareness script which maps the IP addresses to
rack names in this way.
10.31.1.* -> /room1/rack1
10.31.2.* -> /room1/rack2
10.31.3.* -> /room1/rack3
10.31.100.* -> /room2/rack1
10.31.200.* -> /room2/rack2
10.31.200.* -> /room2/rack3
I understand that DFS will try to have re
I understand that the blocks are transferred between various nodes
using HDFS protocol. I believe, even the job classes are distributed
as files using the same HDFS protocol.
Is this protocol written over TCP/IP from scratch or this is a
protocol that works on top of some other protocol like HTTP,
On Thu, May 7, 2009 at 3:10 AM, Owen O'Malley wrote:
>
> On May 6, 2009, at 12:15 PM, Foss User wrote:
>
>> Is it possible to sort the intermediate values for each key before
>> they pair reaches the reducer?
>
> Look at the example SecondarySort.
Where can I f
On Fri, May 8, 2009 at 1:20 AM, Raghu Angadi wrote:
>
>
> Philip Zeyliger wrote:
>>
>> It's over TCP/IP, in a custom protocol. See DataXceiver.java. My sense
>> is
>> that it's a custom protocol because Hadoop's IPC mechanism isn't optimized
>> for large messages.
>
> yes, and job classes are no
I was trying to write a Java code to copy a file from local system to
a file system (which is also local file system). This is my code.
package in.fossist.examples;
import java.io.File;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
impo
On Fri, May 8, 2009 at 1:59 AM, Todd Lipcon wrote:
> On Thu, May 7, 2009 at 1:26 PM, Foss User wrote:
>
>> I was trying to write a Java code to copy a file from local system to
>> a file system (which is also local file system). This is my code.
>>
>> package in.
Sometimes I would like to just execute a certain method in all nodes.
The method does not need inputs. So, there is no need of any
InputFormat implementation class. So, I would want to just write a
Mapper implementation class with a map() method. But, the problem with
map() method is that it always
On Thu, May 7, 2009 at 9:45 PM, jason hadoop wrote:
> If you have it available still, via the job tracker web interface, attach
> the per job xml configuration
Job Configuration: JobId - job_200905071619_0003
namevalue
fs.s3n.impl org.apache.hadoop.fs.s3native.NativeS3FileSystem
mapred.t
I know that if a file is very large, it will be split into blocks and
the blocks would be spread out in various data nodes. I want to know
whether I can find out through GUI or logs exactly where which data
nodes contain which file blocks of a particular huge text file?
On Tue, May 19, 2009 at 12:53 PM, Ravi Phulari wrote:
> If you have hadoop superuser/administrative permissions you can use fsck
> with correct options to view block report and locations for every block.
>
> For further information please refer -
> http://hadoop.apache.org/core/docs/r0.20.0/comma
I ran a job. In the jobtracker web interface, I found 4 maps and 1
reduce running. This is not what I set in my configuration files
(hadoop-site.xml).
My configuration file is set as follows:
mapred.map.tasks = 2
mapred.reduce.tasks = 2
However, the description of these properties mention that t
On Tue, May 19, 2009 at 5:32 PM, Piotr Praczyk wrote:
> Hi
>
> Your job configuration file specifies exactly the numbers of mappers and
> reducers that are running in your system. The job configuration overrides
> site configuration ( if parameters are not specified as final ) as far as I
> know.
On Tue, May 19, 2009 at 8:04 PM, He Chen wrote:
> change following parameter
> mapred.reduce.max.attempts 4
> mapred.reduce.tasks 1To
> mapred.reduce.max.attempts 2
> mapred.reduce.tasks 2
> In your program source code!
If these parameters in hadoop-site.xml is always going to b
On Tue, May 19, 2009 at 8:23 PM, He Chen wrote:
> I think, they are not overridden every times. If you do not give any
> configuration in your source code, the hadoop-site.xml will helps you
> configure the framework. At the same time, you will not configure all the
> parameters of hadoop framewor
I ran a job. In the jobtracker web interface, I found 4 maps and 1
reduce running. This is not what I set in my configuration files
(hadoop-site.xml).
My configuration file, conf/hadoop-site.xml is set as follows:
mapred.map.tasks = 2
mapred.reduce.tasks = 2
However, the description of these pro
When we see the job details on the job tracker web interface, we see
"bytes read" as well as "local bytes read". What is the difference
between the two?
On Wed, May 20, 2009 at 1:52 AM, Piotr Praczyk wrote:
> After a first mail I understood that you are providing additional job.xml (
> which can be done).
> What version of Hadoop do you use ? In 0.20 there was some change in
> configuration files - as far as I understood from the messages,
> hadoo
On Wed, May 20, 2009 at 3:39 AM, Chuck Lam wrote:
> Can you set the number of reducers to zero and see if it becomes a map only
> job? If it does, then it's able to read in the mapred.reduce.tasks property
> correctly but just refuse to have 2 reducers. In that case, it's most likely
> you're runn
On Wed, May 20, 2009 at 3:18 PM, Tom White wrote:
> The number of maps to use is calculated on the client, since splits
> are computed on the client, so changing the value of mapred.map.tasks
> only on the jobtracker will not have any effect.
>
> Note that the number of map tasks that you set is o
39 matches
Mail list logo