Question on distribution of classes and jobs

2009-04-03 Thread Foss User
If I have written a WordCount.java job in this manner: conf.setMapperClass(Map.class); conf.setCombinerClass(Combine.class); conf.setReducerClass(Reduce.class); So, you can see that three classes are being used here. I have packaged these classes into a jar file called wc

Newbie questions on Hadoop topology

2009-04-04 Thread Foss User
I was going through the tutorial here. http://hadoop.apache.org/core/docs/current/cluster_setup.html Certain things are not clear. I am asking them point-wise. I have a setup of 4 linux machines. 1 name node, 1 job tracker and 2 slaves (each is data node as well as task tracker). 1. Should I edi

NullPointerException while starting start-dfs.sh

2009-04-04 Thread Foss User
Whenever I try to start the DFS, I get this error: had...@namenode:~/hadoop-0.19.1$ bin/start-dfs.sh starting namenode, logging to /home/hadoop/hadoop-0.19.1/bin/../logs/hadoop-hadoop-namenode-hadoop-namenode.out 10.31.253.142: starting datanode, logging to /home/hadoop/hadoop-0.19.1/bin/../logs/h

Why namenode logs into itself as well as job tracker?

2009-04-04 Thread Foss User
I have a namenode and job tracker on two different machines. I see that a namenode tries to do an ssh log into itself (name node), job tracker as well as all slave machines. However, the job tracker tries to do an ssh log into the slave machines only. Why this difference in behavior? Could someon

Re: Newbie questions on Hadoop topology

2009-04-04 Thread Foss User
I have a few more questions on your answers. Please see them inline. On Sun, Apr 5, 2009 at 10:27 AM, Todd Lipcon wrote: > On Sat, Apr 4, 2009 at 3:47 AM, Foss User wrote: >> >> 1. Should I edit conf/slaves on all nodes or only on name node? Do I >> have to edit th

Re: NullPointerException while starting start-dfs.sh

2009-04-04 Thread Foss User
On Sat, Apr 4, 2009 at 6:46 PM, Foss User wrote: > Whenever I try to start the DFS, I get this error: > > had...@namenode:~/hadoop-0.19.1$ bin/start-dfs.sh > starting namenode, logging to > /home/hadoop/hadoop-0.19.1/bin/../logs/hadoop-hadoop-namenode-hadoop-namenode.out > 10.31

Newbie questions on H adoop local directories?

2009-04-05 Thread Foss User
I am trying to learn Hadoop and a lot of questions come to my mind when I try to learn it. So, I will be asking a few questions here from time to time until I feel completely comfortable with it. Here are some questions now: 1. Is it true that Hadoop should be installed on the same location on all

After a node goes down, I can't run jobs

2009-04-05 Thread Foss User
I have a Hadoop cluster of 5 nodes: (1) Namenode (2) Job tracker (3) First slave (4) Second Slave (5) Client from where I submit jobs I brought system no. 4 down by running: bin/hadoop-daemon.sh stop datanode bin/hadoop-daemon.sh stop tasktracker After this I tried running my word count job agai

Re: After a node goes down, I can't run jobs

2009-04-05 Thread Foss User
On Sun, Apr 5, 2009 at 3:18 PM, Foss User wrote: > I have a Hadoop cluster of 5 nodes: (1) Namenode (2) Job tracker (3) > First slave (4) Second Slave (5) Client from where I submit jobs > > I brought system no. 4 down by running: > > bin/hadoop-daemon.sh stop datanode > bin/

Users are not properly authenticated in Hadoop

2009-04-05 Thread Foss User
I created a Hadoop cluster. I created a folder in it called '/fossist' and gave the ownership of that folder only to the user called 'fossist'. Only 'fossist' has write permissions over the folder called '/fossist'. However, I see that anyone can easily impersonate as fossist in the following mann

Setting /etc/hosts entry for namenode causes job submission failure

2009-04-05 Thread Foss User
I have a Linux machine where I do not run namenode or tasktracker but I have hadoop installed. I use this machine to submit jobs to the cluster. I see that the moment I put /etc/hosts entry for my-namenode, I get the following error: foss...@cave:~/mcr-wordcount$ hadoop jar dist/mcr-wordcount-0.1.

How large is one file split?

2009-04-14 Thread Foss User
In the documentation I was reading that files are stored as file splits in the HDFS. What is the size of each file split? Is it configurable? If yes, how can I configure it?

Folders and files still present after format

2009-05-06 Thread Foss User
Today I formatted the namenode while the namenode and jobtracker was up. I found that I was still able to browse the file system using the command: bin/hadoop dfs -lsr / Then, I stopped the namenode and jobtracker and did a format again. I started the namenode and jobtracker. I could still browse

Is it possible to sort intermediate values and final values?

2009-05-06 Thread Foss User
Is it possible to sort the intermediate values for each key before they pair reaches the reducer? Also, is it possible to sort the final output pairs from reducer before it is written into the HDFS?

About Hadoop optimizations

2009-05-06 Thread Foss User
1. Do the reducers of a job start only after all mappers have finished? 2. Say there are 10 slave nodes. Let us say one of the nodes is very slow as compared to other nodes. So, while the mappers in the other 9 have finished in 2 minutes, the one on the slow one might take 20 minutes. Is Hadoop i

Re: Folders and files still present after format

2009-05-06 Thread Foss User
On Thu, May 7, 2009 at 12:44 AM, Todd Lipcon wrote: > On Wed, May 6, 2009 at 11:40 AM, Foss User wrote: > >> Today I formatted the namenode while the namenode and jobtracker was >> up. I found that I was still able to browse the file system using the >> command: bin/hado

Re: About Hadoop optimizations

2009-05-06 Thread Foss User
Thanks for your response. I got a few more questions regarding optimizations. 1. Does hadoop clients locally cache the data it last requested? 2. Is the meta data for file blocks on data node kept in the underlying OS's file system on namenode or is it kept in RAM of the name node? 3. If no mapp

Re: About Hadoop optimizations

2009-05-06 Thread Foss User
Thanks for your response again. I could not understand a few things in your reply. So, I want to clarify them. Please find my questions inline. On Thu, May 7, 2009 at 2:28 AM, Todd Lipcon wrote: > On Wed, May 6, 2009 at 1:46 PM, Foss User wrote: >> 2. Is the meta data for file block

All keys went to single reducer in WordCount program

2009-05-07 Thread Foss User
I have two reducers running on two different machines. I ran the example word count program with some of my own System.out.println() statements to see what is going on. There were 2 slaves each running datanode as well as tasktracker. There was one namenode and one jobtracker. I know there is a ve

Re: All keys went to single reducer in WordCount program

2009-05-07 Thread Foss User
On Thu, May 7, 2009 at 8:51 PM, jason hadoop wrote: > Most likely the 3rd mapper ran as a speculative execution, and it is > possible that all of your keys hashed to a single partition. Also, if you > don't specify the default is to run a single reduce task. As I mentioned in my first mail, I tri

Is "/room1" in the rack name "/room1/rack1" significant during replication?

2009-05-07 Thread Foss User
I have written a rack awareness script which maps the IP addresses to rack names in this way. 10.31.1.* -> /room1/rack1 10.31.2.* -> /room1/rack2 10.31.3.* -> /room1/rack3 10.31.100.* -> /room2/rack1 10.31.200.* -> /room2/rack2 10.31.200.* -> /room2/rack3 I understand that DFS will try to have re

Is HDFS protocol written from scratch?

2009-05-07 Thread Foss User
I understand that the blocks are transferred between various nodes using HDFS protocol. I believe, even the job classes are distributed as files using the same HDFS protocol. Is this protocol written over TCP/IP from scratch or this is a protocol that works on top of some other protocol like HTTP,

Re: Is it possible to sort intermediate values and final values?

2009-05-07 Thread Foss User
On Thu, May 7, 2009 at 3:10 AM, Owen O'Malley wrote: > > On May 6, 2009, at 12:15 PM, Foss User wrote: > >> Is it possible to sort the intermediate values for each key before >> they pair reaches the reducer? > > Look at the example SecondarySort. Where can I f

Re: Is HDFS protocol written from scratch?

2009-05-07 Thread Foss User
On Fri, May 8, 2009 at 1:20 AM, Raghu Angadi wrote: > > > Philip Zeyliger wrote: >> >> It's over TCP/IP, in a custom protocol.  See DataXceiver.java.  My sense >> is >> that it's a custom protocol because Hadoop's IPC mechanism isn't optimized >> for large messages. > > yes, and job classes are no

NullPointerException while trying to copy file

2009-05-07 Thread Foss User
I was trying to write a Java code to copy a file from local system to a file system (which is also local file system). This is my code. package in.fossist.examples; import java.io.File; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; impo

Re: NullPointerException while trying to copy file

2009-05-07 Thread Foss User
On Fri, May 8, 2009 at 1:59 AM, Todd Lipcon wrote: > On Thu, May 7, 2009 at 1:26 PM, Foss User wrote: > >> I was trying to write a Java code to copy a file from local system to >> a file system (which is also local file system). This is my code. >> >> package in.

How to write a map() method that needs no input?

2009-05-07 Thread Foss User
Sometimes I would like to just execute a certain method in all nodes. The method does not need inputs. So, there is no need of any InputFormat implementation class. So, I would want to just write a Mapper implementation class with a map() method. But, the problem with map() method is that it always

Re: All keys went to single reducer in WordCount program

2009-05-07 Thread Foss User
On Thu, May 7, 2009 at 9:45 PM, jason hadoop wrote: > If you have it available still, via the job tracker web interface, attach > the per job xml configuration Job Configuration: JobId - job_200905071619_0003 namevalue fs.s3n.impl org.apache.hadoop.fs.s3native.NativeS3FileSystem mapred.t

Finding where the file blocks are

2009-05-19 Thread Foss User
I know that if a file is very large, it will be split into blocks and the blocks would be spread out in various data nodes. I want to know whether I can find out through GUI or logs exactly where which data nodes contain which file blocks of a particular huge text file?

Re: Finding where the file blocks are

2009-05-19 Thread Foss User
On Tue, May 19, 2009 at 12:53 PM, Ravi Phulari wrote: > If you have hadoop superuser/administrative  permissions you can use fsck > with correct options to view block report and locations for every block. > > For further information please refer - > http://hadoop.apache.org/core/docs/r0.20.0/comma

Number of maps and reduces not obeying my configuration

2009-05-19 Thread Foss User
I ran a job. In the jobtracker web interface, I found 4 maps and 1 reduce running. This is not what I set in my configuration files (hadoop-site.xml). My configuration file is set as follows: mapred.map.tasks = 2 mapred.reduce.tasks = 2 However, the description of these properties mention that t

Re: Number of maps and reduces not obeying my configuration

2009-05-19 Thread Foss User
On Tue, May 19, 2009 at 5:32 PM, Piotr Praczyk wrote: > Hi > > Your job configuration file specifies exactly the numbers of mappers and > reducers that are running in your system. The job configuration overrides > site configuration ( if parameters are not specified as final ) as far as I > know.

Re: Number of maps and reduces not obeying my configuration

2009-05-19 Thread Foss User
On Tue, May 19, 2009 at 8:04 PM, He Chen wrote: > change following parameter > mapred.reduce.max.attempts      4 > mapred.reduce.tasks     1To > mapred.reduce.max.attempts      2 > mapred.reduce.tasks     2 > In your program source code! If these parameters in hadoop-site.xml is always going to b

Re: Number of maps and reduces not obeying my configuration

2009-05-19 Thread Foss User
On Tue, May 19, 2009 at 8:23 PM, He Chen wrote: > I think, they are not overridden every times. If you do not give any > configuration in your source code, the hadoop-site.xml will helps you > configure the framework. At the same time, you will not configure all the > parameters of hadoop framewor

My configuration in conf/hadoop-site.xml is not being used. Why?

2009-05-19 Thread Foss User
I ran a job. In the jobtracker web interface, I found 4 maps and 1 reduce running. This is not what I set in my configuration files (hadoop-site.xml). My configuration file, conf/hadoop-site.xml is set as follows: mapred.map.tasks = 2 mapred.reduce.tasks = 2 However, the description of these pro

difference between bytes read and local bytes read?

2009-05-19 Thread Foss User
When we see the job details on the job tracker web interface, we see "bytes read" as well as "local bytes read". What is the difference between the two?

Re: Number of maps and reduces not obeying my configuration

2009-05-19 Thread Foss User
On Wed, May 20, 2009 at 1:52 AM, Piotr Praczyk wrote: > After a first mail I understood that you are providing additional job.xml ( > which can be done). > What version of Hadoop do you use ? In 0.20 there was some change in > configuration files - as far as I understood from the messages, > hadoo

Re: Number of maps and reduces not obeying my configuration

2009-05-19 Thread Foss User
On Wed, May 20, 2009 at 3:39 AM, Chuck Lam wrote: > Can you set the number of reducers to zero and see if it becomes a map only > job? If it does, then it's able to read in the mapred.reduce.tasks property > correctly but just refuse to have 2 reducers. In that case, it's most likely > you're runn

Re: Number of maps and reduces not obeying my configuration

2009-05-20 Thread Foss User
On Wed, May 20, 2009 at 3:18 PM, Tom White wrote: > The number of maps to use is calculated on the client, since splits > are computed on the client, so changing the value of mapred.map.tasks > only on the jobtracker will not have any effect. > > Note that the number of map tasks that you set is o