Crawl failing when using hadoop

Karthik Ramesh Sun, 10 Feb 2008 02:20:41 -0800

Hi,

I have just started using hadoop for performing nutch crawls on a cluster of 5 
servers. I am using nutch 0.9.
I have gone through the initial setup as told in 
http://wiki.apache.org/nutch/NutchHadoopTutorial.


I am also able to start all the servers using the start-all.sh and also upload 
the list of urls to the dfs. But after I initiate the crawl,
I get the following exception,

crawl started in: crawled
rootUrlDir = urls
threads = 10
depth = 3
Injector: starting
Injector: crawlDb: crawled/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
        at java.net.Socket.connect(Socket.java:519)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:149)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:531)
        at org.apache.hadoop.ipc.Client.call(Client.java:458)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163)
        at $Proxy1.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:247)
        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:208)
        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:200)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:528)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:115)


Any idea where I could be going wrong?
Thanks,

- Karthik.




      Save all your chat conversations. Find them online at 
http://in.messenger.yahoo.com/webmessengerpromo.php

Crawl failing when using hadoop

Reply via email to