I may be wrong but my intuition keeps telling me it is a dns issue.
Below you have websearch1. Have you tried fully qualified domain names
in both the hadoop-site.xml and slaves files. For example
websearch1.internal.com:9000 where internal is your internal domain. I
have seen some computers have a hard time connecting when not using
fully qualified names. Try that and let me know the result.
The other thing I see, although I don't think it would be causing any
problems, is on some of your properties below there are spaces in the
names. I would remove the spaces.
Dennis
srinath wrote:
Hi Dennis,
Yes i can ping from slavenode (there is no problems with
network connections and i have successfully cofingured Nutch0.8.1 and
crawled some pages without any issues) ...and below i'm attaching the
configurations which i'm using in hadoop-site.xml file and name of systems i
mentioned in slaves file
The content of hadoop-site.xml is as follows :
<configuration>
<property>
<name>fs.default.name</name>
<value>websearch1:9000</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>websearch1:9001</value>
</property>
<property>
<name>mapred.tasktracker.tasks.maximum </name>
<value>20</value>
</property>
<property>
<name>mapred.map.tasks</name>
<value>60</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>6</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/nutch/nutch0.9/filesystem0.9/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/nutch/nutch0.9/filesystem0.9/data</value>
</property>
<property>
<name> dfs.replication</name>
<value>2</value>
</property>
</configuration>
and slaves.txt
websearch1
websearch2
on Websearch1 datanode is starting without any problems..but
on websearch3 it says it's started task tracker and datanode but they both
can't send heartbeat to masternode(i mean they can't able to register
themselves with masternode ) ..... i'm continuosuly getting the same error
as metioned before...
One more thing i tried yesterday ...was to change the hadoop
version... to 0.5 from 0.9.1 then namenode and datanode started successfully
but ...jobtracker failed to start because nutch version(0.9.1) is using some
classes which are not available on 0.5.....and i tried 0.7.1 and 0.8.2
....same problems....and also i tried with 0.9.2 version i can't succeed
..then i feel there is something to do with configurations?
Dennis Kubes wrote:
Can you ping the master computer (name node) from the slave (data node)
computers. Also is your namenode configuration fs.default.name variable
pointing to 127.0.0.1 or is it pointing to the fully qualified domain
name of the master computer?
Dennis Kubes
srinath wrote:
Hi,
Thx For Your Reply .. But namenode was started successfully on
masternode
.... and datanode where we started on the machine where masternode is
running is able to connect but datanode on the other machine is not able
to
connect back!!!!!!!!!! if u like to see i will post configuration params
what we set????
Dennis Kubes wrote:
I would take a look at the processes on the namenode server and see if
the namenode has started up. It doesn't look like it did. If this is a
new install, did you format the namenode?
Dennis
srinath wrote:
Hi,
While starting hadoop process we are getting the following error in
logs
tasktracker in datanode is not able to connect back to jobtracker (but
jobtracker on the other machine started successfully and listening on
port
9001) ... i'm using Nutch0.9.1 version and Hadoop0.9.1 ..
2007-01-04 23:57:35,559 INFO ipc.Server - IPC Server handler 17 on
50050:
starting
2007-01-04 23:57:35,559 INFO ipc.Server - IPC Server handler 18 on
50050:
starting
2007-01-04 23:57:35,559 INFO mapred.TaskTracker - Starting tracker
tracker_websearch3:50050
2007-01-04 23:57:35,559 INFO ipc.Server - IPC Server handler 19 on
50050:
starting
2007-01-04 23:57:35,566 INFO ipc.Client -
org.apache.hadoop.io.ObjectWritableConnection culler maxidletime=
1000ms
2007-01-04 23:57:35,567 INFO ipc.Client -
org.apache.hadoop.io.ObjectWritable Connection Culler: starting
2007-01-04 23:57:35,589 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 1 time(s).
2007-01-04 23:57:36,590 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 2 time(s).
2007-01-04 23:57:37,600 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 3 time(s).
2007-01-04 23:57:38,610 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 4 time(s).
2007-01-04 23:57:39,620 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 5 time(s).
2007-01-04 23:57:40,630 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 6 time(s).
2007-01-04 23:57:41,640 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 7 time(s).
2007-01-04 23:57:42,650 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 8 time(s).
2007-01-04 23:57:43,660 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 9 time(s).
2007-01-04 23:57:44,670 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 10 time(s).
2007-01-04 23:57:45,680 INFO ipc.RPC - Server at
websearch1/10.50.12.220:9001 not available yet, Zzzzz...
2007-01-04 23:57:46,690 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 1 time(s).
2007-01-04 23:57:47,700 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 2 time(s).
2007-01-04 23:57:48,710 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 3 time(s).
2007-01-04 23:57:49,720 INFO ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 4 time(s).
can any one help? regarding this????? does something to do with hadoop
configuration?