Hi, I don't know if this is the solution for your problem, but you should add at least the depth parameter: # ./bin/nutch crawl urls/ -dir aaa -depth 10 -threads 20
Cheers Alex > -----Ursprüngliche Nachricht----- > Von: toabhishek16 [mailto:[EMAIL PROTECTED] > Gesendet: Montag, 22. September 2008 10:14 > An: [email protected] > Betreff: Error in hadoop crawling > > > Hi to all, > I am trying to crawl using hadoop on the single machine. other command like > hadoop namenode -format and examples provided with hadoop are working fine. > But when I am trying to crawl using hadoop its giving error which I am > pasting below > > [EMAIL PROTECTED] nutch-0.9]# ./bin/nutch crawl urls/ -dir aaa > Exception in thread "main" java.net.SocketTimeoutException: timed out > waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:473) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:163) > at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:247) > at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:105) > at > org.apache.hadoop.dfs.DistributedFileSystem$RawDistributedFileSystem.initial ize(Dist > ributedFileSystem.java:67) > at > org.apache.hadoop.fs.FilterFileSystem.initialize(FilterFileSystem.java:57) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160) > at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:83) > > Please help me to solve this problem.... > > Thanks in advance. > -- > View this message in context: http://www.nabble.com/Error-in-hadoop-crawling- > tp19603532p19603532.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > > __________ NOD32 3458 (20080921) Information __________ > > This message was checked by NOD32 antivirus system. > http://www.eset.com
