I may be wrong but my intuition keeps telling me it is a dns issue. Below you have websearch1. Have you tried fully qualified domain names in both the hadoop-site.xml and slaves files. For example websearch1.internal.com:9000 where internal is your internal domain. I have seen some computers have a hard time connecting when not using fully qualified names. Try that and let me know the result.

The other thing I see, although I don't think it would be causing any problems, is on some of your properties below there are spaces in the names. I would remove the spaces.

Dennis

srinath wrote:
Hi Dennis,
Yes i can ping from slavenode (there is no problems with
network connections and i have successfully cofingured Nutch0.8.1 and
crawled some pages without any issues) ...and below i'm attaching the
configurations which i'm using in hadoop-site.xml file and name of systems i
mentioned in slaves file

The content of hadoop-site.xml is as follows : <configuration> <property> <name>fs.default.name</name> <value>websearch1:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>websearch1:9001</value> </property> <property> <name>mapred.tasktracker.tasks.maximum </name> <value>20</value> </property> <property> <name>mapred.map.tasks</name> <value>60</value> </property> <property> <name>mapred.reduce.tasks</name> <value>6</value> </property> <property> <name>dfs.name.dir</name> <value>/data/nutch/nutch0.9/filesystem0.9/name</value> </property> <property> <name>dfs.data.dir</name> <value>/data/nutch/nutch0.9/filesystem0.9/data</value> </property> <property> <name> dfs.replication</name> <value>2</value> </property> </configuration>
and slaves.txt

websearch1
websearch2
on Websearch1 datanode is starting without any problems..but
on websearch3 it says it's started task tracker and datanode but they both
can't send heartbeat to masternode(i mean they can't able to register
themselves with masternode ) ..... i'm continuosuly getting the same error
as metioned before...
              One more thing i tried yesterday ...was to change the hadoop
version... to 0.5 from 0.9.1 then namenode and datanode started successfully
but ...jobtracker failed to start because nutch version(0.9.1) is using some
classes which are not available on 0.5.....and i tried 0.7.1 and 0.8.2
....same problems....and also i tried with 0.9.2 version i can't succeed
..then i feel there is something to do with configurations?

Dennis Kubes wrote:
Can you ping the master computer (name node) from the slave (data node) computers. Also is your namenode configuration fs.default.name variable pointing to 127.0.0.1 or is it pointing to the fully qualified domain name of the master computer?

Dennis Kubes

srinath wrote:
Hi,
   Thx For Your Reply .. But namenode was started successfully on
masternode
.... and datanode where we started on the machine where masternode  is
running is able to connect but datanode on the other machine is not able
to
connect back!!!!!!!!!! if u like to see i will post configuration params
what we set????


Dennis Kubes wrote:
I would take a look at the processes on the namenode server and see if the namenode has started up. It doesn't look like it did. If this is a new install, did you format the namenode?

Dennis

srinath wrote:
Hi,
    While starting hadoop process we are getting the following error in
logs
tasktracker in datanode is not able to connect back to jobtracker (but
jobtracker on the other machine started successfully and listening on
port
9001) ... i'm using Nutch0.9.1 version and Hadoop0.9.1 ..

2007-01-04 23:57:35,559 INFO  ipc.Server - IPC Server handler 17 on
50050:
starting
2007-01-04 23:57:35,559 INFO  ipc.Server - IPC Server handler 18 on
50050:
starting
2007-01-04 23:57:35,559 INFO  mapred.TaskTracker - Starting tracker
tracker_websearch3:50050
2007-01-04 23:57:35,559 INFO  ipc.Server - IPC Server handler 19 on
50050:
starting
2007-01-04 23:57:35,566 INFO  ipc.Client -
org.apache.hadoop.io.ObjectWritableConnection culler maxidletime=
1000ms
2007-01-04 23:57:35,567 INFO  ipc.Client -
org.apache.hadoop.io.ObjectWritable Connection Culler: starting
2007-01-04 23:57:35,589 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 1 time(s).
2007-01-04 23:57:36,590 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 2 time(s).
2007-01-04 23:57:37,600 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 3 time(s).
2007-01-04 23:57:38,610 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 4 time(s).
2007-01-04 23:57:39,620 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 5 time(s).
2007-01-04 23:57:40,630 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 6 time(s).
2007-01-04 23:57:41,640 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 7 time(s).
2007-01-04 23:57:42,650 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 8 time(s).
2007-01-04 23:57:43,660 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 9 time(s).
2007-01-04 23:57:44,670 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 10 time(s).
2007-01-04 23:57:45,680 INFO  ipc.RPC - Server at
websearch1/10.50.12.220:9001 not available yet, Zzzzz...
2007-01-04 23:57:46,690 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 1 time(s).
2007-01-04 23:57:47,700 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 2 time(s).
2007-01-04 23:57:48,710 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 3 time(s).
2007-01-04 23:57:49,720 INFO  ipc.Client - Retrying connect to server:
websearch1/10.50.12.220:9001. Already tried 4 time(s).


can any one help? regarding this????? does something to do with hadoop
configuration?


Reply via email to