Re: Adding nodes

2012-03-01 Thread George Datskos

Mohit,

New datanodes will connect to the namenode so thats how the namenode 
knows.  Just make sure the datanodes have the correct {fs.default.dir} 
in their hdfs-site.xml and then start them.  The namenode can, however, 
choose to reject the datanode if you are using the {dfs.hosts} and 
{dfs.hosts.exclude} settings in the namenode's hdfs-site.xml.


The namenode doesn't actually care about the slaves file.  It's only 
used by the start/stop scripts.



On 2012/03/02 10:35, Mohit Anchlia wrote:

I actually meant to ask how does namenode/jobtracker know there is a new
node in the cluster. Is it initiated by namenode when slave file is edited?
Or is it initiated by tasktracker when tasktracker is started?






Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
Thanks all for the answers!!

On Thu, Mar 1, 2012 at 5:52 PM, Arpit Gupta  wrote:

> It is initiated by the slave.
>
> If you have defined files to state which slaves can talk to the namenode
> (using config dfs.hosts) and which hosts cannot (using
> property dfs.hosts.exclude) then you would need to edit these files and
> issue the refresh command.
>
>
>  On Mar 1, 2012, at 5:35 PM, Mohit Anchlia wrote:
>
>  On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria 
> wrote:
>
> Not quite. Datanodes get the namenode host from fs.defalt.name in
>
> core-site.xml. Task trackers find the job tracker from the
>
> mapred.job.tracker setting in mapred-site.xml.
>
>
>
> I actually meant to ask how does namenode/jobtracker know there is a new
> node in the cluster. Is it initiated by namenode when slave file is edited?
> Or is it initiated by tasktracker when tasktracker is started?
>
>
> Sent from my iPhone
>
>
> On Mar 1, 2012, at 18:49, Mohit Anchlia  wrote:
>
>
>  On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria 
>
> wrote:
>
>
>  You only have to refresh nodes if you're making use of an allows file.
>
>
>  Thanks does it mean that when tasktracker/datanode starts up it
>
>  communicates with namenode using master file?
>
>
>  Sent from my iPhone
>
>
>  On Mar 1, 2012, at 18:29, Mohit Anchlia  wrote:
>
>
>   Is this the right procedure to add nodes? I took some from hadoop wiki
>
>  FAQ:
>
>
>   http://wiki.apache.org/hadoop/FAQ
>
>
>   1. Update conf/slave
>
>   2. on the slave nodes start datanode and tasktracker
>
>   3. hadoop balancer
>
>
>   Do I also need to run dfsadmin -refreshnodes?
>
>
>
>
>
> --
> Arpit
> Hortonworks, Inc.
> email: ar...@hortonworks.com
>
> 
>  
> 
>


Re: Adding nodes

2012-03-01 Thread Arpit Gupta
It is initiated by the slave. If you have defined files to state which slaves can talk to the namenode (using config dfs.hosts) and which hosts cannot (using property dfs.hosts.exclude) then you would need to edit these files and issue the refresh command.On Mar 1, 2012, at 5:35 PM, Mohit Anchlia wrote:On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria  wrote:Not quite. Datanodes get the namenode host from fs.defalt.name incore-site.xml. Task trackers find the job tracker from themapred.job.tracker setting in mapred-site.xml.I actually meant to ask how does namenode/jobtracker know there is a newnode in the cluster. Is it initiated by namenode when slave file is edited?Or is it initiated by tasktracker when tasktracker is started?Sent from my iPhoneOn Mar 1, 2012, at 18:49, Mohit Anchlia  wrote:On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria wrote:You only have to refresh nodes if you're making use of an allows file.Thanks does it mean that when tasktracker/datanode starts up itcommunicates with namenode using master file?Sent from my iPhoneOn Mar 1, 2012, at 18:29, Mohit Anchlia  wrote:Is this the right procedure to add nodes? I took some from hadoop wikiFAQ:http://wiki.apache.org/hadoop/FAQ1. Update conf/slave2. on the slave nodes start datanode and tasktracker3. hadoop balancerDo I also need to run dfsadmin -refreshnodes?
--ArpitHortonworks, Inc.email: ar...@hortonworks.com



Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
WHat Joey said is correct for both apache and cloudera distros. The DN/TT  
daemons  will connect to the NN/JT using the config files. The master and slave 
files are used for starting the correct daemons.



>
> From: anil gupta 
>To: common-user@hadoop.apache.org; Raj Vishwanathan  
>Sent: Thursday, March 1, 2012 5:42 PM
>Subject: Re: Adding nodes
> 
>Whatever Joey said is correct for Cloudera's distribution. For same, I am
>not confident about other distribution as i haven't tried them.
>
>Thanks,
>Anil
>
>On Thu, Mar 1, 2012 at 5:10 PM, Raj Vishwanathan  wrote:
>
>> The master and slave files, if I remember correctly are used to start the
>> correct daemons on the correct nodes from the master node.
>>
>>
>> Raj
>>
>>
>> >
>> > From: Joey Echeverria 
>> >To: "common-user@hadoop.apache.org" 
>> >Cc: "common-user@hadoop.apache.org" 
>> >Sent: Thursday, March 1, 2012 4:57 PM
>> >Subject: Re: Adding nodes
>> >
>> >Not quite. Datanodes get the namenode host from fs.defalt.name in
>> core-site.xml. Task trackers find the job tracker from the
>> mapred.job.tracker setting in mapred-site.xml.
>> >
>> >Sent from my iPhone
>> >
>> >On Mar 1, 2012, at 18:49, Mohit Anchlia  wrote:
>> >
>> >> On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria 
>> wrote:
>> >>
>> >>> You only have to refresh nodes if you're making use of an allows file.
>> >>>
>> >>> Thanks does it mean that when tasktracker/datanode starts up it
>> >> communicates with namenode using master file?
>> >>
>> >> Sent from my iPhone
>> >>>
>> >>> On Mar 1, 2012, at 18:29, Mohit Anchlia 
>> wrote:
>> >>>
>> >>>> Is this the right procedure to add nodes? I took some from hadoop wiki
>> >>> FAQ:
>> >>>>
>> >>>> http://wiki.apache.org/hadoop/FAQ
>> >>>>
>> >>>> 1. Update conf/slave
>> >>>> 2. on the slave nodes start datanode and tasktracker
>> >>>> 3. hadoop balancer
>> >>>>
>> >>>> Do I also need to run dfsadmin -refreshnodes?
>> >>>
>> >
>> >
>> >
>>
>
>
>
>-- 
>Thanks & Regards,
>Anil Gupta
>
>
>

Re: Adding nodes

2012-03-01 Thread anil gupta
Whatever Joey said is correct for Cloudera's distribution. For same, I am
not confident about other distribution as i haven't tried them.

Thanks,
Anil

On Thu, Mar 1, 2012 at 5:10 PM, Raj Vishwanathan  wrote:

> The master and slave files, if I remember correctly are used to start the
> correct daemons on the correct nodes from the master node.
>
>
> Raj
>
>
> >
> > From: Joey Echeverria 
> >To: "common-user@hadoop.apache.org" 
> >Cc: "common-user@hadoop.apache.org" 
> >Sent: Thursday, March 1, 2012 4:57 PM
> >Subject: Re: Adding nodes
> >
> >Not quite. Datanodes get the namenode host from fs.defalt.name in
> core-site.xml. Task trackers find the job tracker from the
> mapred.job.tracker setting in mapred-site.xml.
> >
> >Sent from my iPhone
> >
> >On Mar 1, 2012, at 18:49, Mohit Anchlia  wrote:
> >
> >> On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria 
> wrote:
> >>
> >>> You only have to refresh nodes if you're making use of an allows file.
> >>>
> >>> Thanks does it mean that when tasktracker/datanode starts up it
> >> communicates with namenode using master file?
> >>
> >> Sent from my iPhone
> >>>
> >>> On Mar 1, 2012, at 18:29, Mohit Anchlia 
> wrote:
> >>>
> >>>> Is this the right procedure to add nodes? I took some from hadoop wiki
> >>> FAQ:
> >>>>
> >>>> http://wiki.apache.org/hadoop/FAQ
> >>>>
> >>>> 1. Update conf/slave
> >>>> 2. on the slave nodes start datanode and tasktracker
> >>>> 3. hadoop balancer
> >>>>
> >>>> Do I also need to run dfsadmin -refreshnodes?
> >>>
> >
> >
> >
>



-- 
Thanks & Regards,
Anil Gupta


Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria  wrote:

> Not quite. Datanodes get the namenode host from fs.defalt.name in
> core-site.xml. Task trackers find the job tracker from the
> mapred.job.tracker setting in mapred-site.xml.
>

I actually meant to ask how does namenode/jobtracker know there is a new
node in the cluster. Is it initiated by namenode when slave file is edited?
Or is it initiated by tasktracker when tasktracker is started?

>
> Sent from my iPhone
>
> On Mar 1, 2012, at 18:49, Mohit Anchlia  wrote:
>
> > On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria 
> wrote:
> >
> >> You only have to refresh nodes if you're making use of an allows file.
> >>
> >> Thanks does it mean that when tasktracker/datanode starts up it
> > communicates with namenode using master file?
> >
> > Sent from my iPhone
> >>
> >> On Mar 1, 2012, at 18:29, Mohit Anchlia  wrote:
> >>
> >>> Is this the right procedure to add nodes? I took some from hadoop wiki
> >> FAQ:
> >>>
> >>> http://wiki.apache.org/hadoop/FAQ
> >>>
> >>> 1. Update conf/slave
> >>> 2. on the slave nodes start datanode and tasktracker
> >>> 3. hadoop balancer
> >>>
> >>> Do I also need to run dfsadmin -refreshnodes?
> >>
>


Re: Adding nodes

2012-03-01 Thread Raj Vishwanathan
The master and slave files, if I remember correctly are used to start the 
correct daemons on the correct nodes from the master node.


Raj


>
> From: Joey Echeverria 
>To: "common-user@hadoop.apache.org"  
>Cc: "common-user@hadoop.apache.org"  
>Sent: Thursday, March 1, 2012 4:57 PM
>Subject: Re: Adding nodes
> 
>Not quite. Datanodes get the namenode host from fs.defalt.name in 
>core-site.xml. Task trackers find the job tracker from the mapred.job.tracker 
>setting in mapred-site.xml. 
>
>Sent from my iPhone
>
>On Mar 1, 2012, at 18:49, Mohit Anchlia  wrote:
>
>> On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria  wrote:
>> 
>>> You only have to refresh nodes if you're making use of an allows file.
>>> 
>>> Thanks does it mean that when tasktracker/datanode starts up it
>> communicates with namenode using master file?
>> 
>> Sent from my iPhone
>>> 
>>> On Mar 1, 2012, at 18:29, Mohit Anchlia  wrote:
>>> 
>>>> Is this the right procedure to add nodes? I took some from hadoop wiki
>>> FAQ:
>>>> 
>>>> http://wiki.apache.org/hadoop/FAQ
>>>> 
>>>> 1. Update conf/slave
>>>> 2. on the slave nodes start datanode and tasktracker
>>>> 3. hadoop balancer
>>>> 
>>>> Do I also need to run dfsadmin -refreshnodes?
>>> 
>
>
>

Re: Adding nodes

2012-03-01 Thread Joey Echeverria
Not quite. Datanodes get the namenode host from fs.defalt.name in 
core-site.xml. Task trackers find the job tracker from the mapred.job.tracker 
setting in mapred-site.xml. 

Sent from my iPhone

On Mar 1, 2012, at 18:49, Mohit Anchlia  wrote:

> On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria  wrote:
> 
>> You only have to refresh nodes if you're making use of an allows file.
>> 
>> Thanks does it mean that when tasktracker/datanode starts up it
> communicates with namenode using master file?
> 
> Sent from my iPhone
>> 
>> On Mar 1, 2012, at 18:29, Mohit Anchlia  wrote:
>> 
>>> Is this the right procedure to add nodes? I took some from hadoop wiki
>> FAQ:
>>> 
>>> http://wiki.apache.org/hadoop/FAQ
>>> 
>>> 1. Update conf/slave
>>> 2. on the slave nodes start datanode and tasktracker
>>> 3. hadoop balancer
>>> 
>>> Do I also need to run dfsadmin -refreshnodes?
>> 


Re: Adding nodes

2012-03-01 Thread Mohit Anchlia
On Thu, Mar 1, 2012 at 4:46 PM, Joey Echeverria  wrote:

> You only have to refresh nodes if you're making use of an allows file.
>
> Thanks does it mean that when tasktracker/datanode starts up it
communicates with namenode using master file?

Sent from my iPhone
>
> On Mar 1, 2012, at 18:29, Mohit Anchlia  wrote:
>
> > Is this the right procedure to add nodes? I took some from hadoop wiki
> FAQ:
> >
> > http://wiki.apache.org/hadoop/FAQ
> >
> > 1. Update conf/slave
> > 2. on the slave nodes start datanode and tasktracker
> > 3. hadoop balancer
> >
> > Do I also need to run dfsadmin -refreshnodes?
>


Re: Adding nodes

2012-03-01 Thread Joey Echeverria
You only have to refresh nodes if you're making use of an allows file. 

Sent from my iPhone

On Mar 1, 2012, at 18:29, Mohit Anchlia  wrote:

> Is this the right procedure to add nodes? I took some from hadoop wiki FAQ:
> 
> http://wiki.apache.org/hadoop/FAQ
> 
> 1. Update conf/slave
> 2. on the slave nodes start datanode and tasktracker
> 3. hadoop balancer
> 
> Do I also need to run dfsadmin -refreshnodes?


Adding nodes

2012-03-01 Thread Mohit Anchlia
Is this the right procedure to add nodes? I took some from hadoop wiki FAQ:

http://wiki.apache.org/hadoop/FAQ

1. Update conf/slave
2. on the slave nodes start datanode and tasktracker
3. hadoop balancer

Do I also need to run dfsadmin -refreshnodes?


Re: Dynamically adding nodes in Hadoop

2012-01-03 Thread madhu phatak
Thanks for all the input. I am trying to do cluster setup in EC2 but not
able to find how i can do dns updation centrally. If anyone one knows how
to do this please help me ..

On Sat, Dec 17, 2011 at 8:10 PM, Michel Segel wrote:

> Actually I would recommend avoiding /etc/hosts and using DNS if this is
> going to be a production grade cluster...
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Dec 17, 2011, at 5:40 AM, alo alt  wrote:
>
> > Hi,
> >
> > in the slave - file too. /etc/hosts is also recommend to avoid DNS
> > issues. After adding in slaves the new node has to be started and
> > should quickly appear in the web-ui. If you don't need the nodes all
> > time you can setup a exclude and refresh your cluster
> > (
> http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F
> )
> >
> > - Alex
> >
> > On Sat, Dec 17, 2011 at 12:06 PM, madhu phatak 
> wrote:
> >> Hi,
> >>  I am trying to add nodes dynamically to a running hadoop cluster.I
> started
> >> tasktracker and datanode in the node. It works fine. But when some node
> >> try fetch values ( for reduce phase) it fails with unknown host
> exception.
> >> When i add a node to running cluster do i have to add its hostname to
> all
> >> nodes (slaves +master) /etc/hosts file? Or some other way is there?
> >>
> >>
> >> --
> >> Join me at http://hadoopworkshop.eventbrite.com/
> >
> >
> >
> > --
> > Alexander Lorenz
> > http://mapredit.blogspot.com
> >
> > P Think of the environment: please don't print this email unless you
> > really need to.
> >
>



-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Dynamically adding nodes in Hadoop

2011-12-17 Thread Michel Segel
Actually I would recommend avoiding /etc/hosts and using DNS if this is going 
to be a production grade cluster...

Sent from a remote device. Please excuse any typos...

Mike Segel

On Dec 17, 2011, at 5:40 AM, alo alt  wrote:

> Hi,
> 
> in the slave - file too. /etc/hosts is also recommend to avoid DNS
> issues. After adding in slaves the new node has to be started and
> should quickly appear in the web-ui. If you don't need the nodes all
> time you can setup a exclude and refresh your cluster
> (http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F)
> 
> - Alex
> 
> On Sat, Dec 17, 2011 at 12:06 PM, madhu phatak  wrote:
>> Hi,
>>  I am trying to add nodes dynamically to a running hadoop cluster.I started
>> tasktracker and datanode in the node. It works fine. But when some node
>> try fetch values ( for reduce phase) it fails with unknown host exception.
>> When i add a node to running cluster do i have to add its hostname to all
>> nodes (slaves +master) /etc/hosts file? Or some other way is there?
>> 
>> 
>> --
>> Join me at http://hadoopworkshop.eventbrite.com/
> 
> 
> 
> -- 
> Alexander Lorenz
> http://mapredit.blogspot.com
> 
> P Think of the environment: please don't print this email unless you
> really need to.
> 


Re: Dynamically adding nodes in Hadoop

2011-12-17 Thread alo alt
Hi,

in the slave - file too. /etc/hosts is also recommend to avoid DNS
issues. After adding in slaves the new node has to be started and
should quickly appear in the web-ui. If you don't need the nodes all
time you can setup a exclude and refresh your cluster
(http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F)

- Alex

On Sat, Dec 17, 2011 at 12:06 PM, madhu phatak  wrote:
> Hi,
>  I am trying to add nodes dynamically to a running hadoop cluster.I started
> tasktracker and datanode in the node. It works fine. But when some node
> try fetch values ( for reduce phase) it fails with unknown host exception.
> When i add a node to running cluster do i have to add its hostname to all
> nodes (slaves +master) /etc/hosts file? Or some other way is there?
>
>
> --
> Join me at http://hadoopworkshop.eventbrite.com/



-- 
Alexander Lorenz
http://mapredit.blogspot.com

P Think of the environment: please don't print this email unless you
really need to.


Re: Dynamically adding nodes in Hadoop

2011-12-17 Thread Harsh J
Madhu,

On Sat, Dec 17, 2011 at 4:36 PM, madhu phatak  wrote:
> When i add a node to running cluster do i have to add its hostname to all
> nodes (slaves +master) /etc/hosts file?

Yes.

> Or some other way is there?

You can run a DNS, and have the resolution centrally managed.

-- 
Harsh J


Dynamically adding nodes in Hadoop

2011-12-17 Thread madhu phatak
Hi,
 I am trying to add nodes dynamically to a running hadoop cluster.I started
tasktracker and datanode in the node. It works fine. But when some node
try fetch values ( for reduce phase) it fails with unknown host exception.
When i add a node to running cluster do i have to add its hostname to all
nodes (slaves +master) /etc/hosts file? Or some other way is there?


-- 
Join me at http://hadoopworkshop.eventbrite.com/


After adding nodes to 0.20.2 cluster, getting "Could not complete file" errors and hung JobTracker

2010-10-15 Thread Bobby Dennett
Hi all,

We are currently in the process of replacing the servers in our Hadoop
0.20.2 production cluster and in the last couple of days have
experienced an error similar to the following (from the JobTracker
log) several times, which then appears to hang the JobTracker:

2010-10-15 04:13:38,980 INFO org.apache.hadoop.mapred.JobInProgress:
Job job_201010140844_0510 has completed successfully.
2010-10-15 04:13:44,192 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/hist
ory/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...
2010-10-15 04:13:44,592 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/hist
ory/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...
2010-10-15 04:13:44,993 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/hist
ory/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...
2010-10-15 04:13:45,393 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/hist
ory/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...
2010-10-15 04:13:45,794 INFO org.apache.hadoop.hdfs.DFSClient: Could
not complete file
/user/kaduindexer-18509/us/201010150300/dealdocid_pre_merged_1/_logs/history/phx-phadoop34_1287060250080_job_201010140844_0510_se_DocID_Merge_1_201010150300
retrying...

We haven't seen an issue like this until we added 6 new nodes to our
existing 65 node cluster. The only other configuration change made
recently was to setup include/exclude files for DFS and MapReduce to
"enable" Hadoop's node decommissioning functionality.

Once we encounter this issue (which has happened twice in the last 24
hours), we end up needing to restart the MapReduce processes which we
cannot do on a frequent basis. After the last occurrence, I increased
the value of the mapred.job.tracker.handler.count to 60 and am waiting
to see if it has an impact.

Has anyone else seen this behavior before? Are there any
recommendations for trying to prevent this from happening in the
future?

Thanks in advance,
-Bobby