Hi,
I have just started using hadoop for performing nutch crawls on a cluster of 5
servers. I am using nutch 0.9.
I have gone through the initial setup as told in
http://wiki.apache.org/nutch/NutchHadoopTutorial.
I am also able to start all the servers using the start-all.sh and also upload
the list of urls to the dfs. But after I initiate the crawl,
I get the following exception,
crawl started in: crawled
rootUrlDir = urls
threads = 10
depth = 3
Injector: starting
Injector: crawlDb: crawled/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:149)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:531)
at org.apache.hadoop.ipc.Client.call(Client.java:458)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
at $Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:247)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:208)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:200)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:528)
at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)
Any idea where I could be going wrong?
Thanks,
- Karthik.
Save all your chat conversations. Find them online at
http://in.messenger.yahoo.com/webmessengerpromo.php