Just a quick note on the pseudo distributed mode and its advantages : you can configure Hadoop to use multiple tasks and as James said you make a better use of your hardware. You also get more information about your crawl with the JobTracker UI and the counters the jobs which is more convenient than having to go through the log files. It makes sense to use it even if the number of pages you want to crawl is very small. It is also a good thing to do to get familiar with the way things are done with the fully distributed mode.
I also agree with James that the local mode is useful for debugging when writing new plugins or trying to reproduce a bug. I'll let one of the Nutch 2 +HBase expert to answer your question J. On 16 October 2012 07:30, <[email protected]> wrote: > Non-distributed mode uses a single Java process. This is useful for > debugging. > Pseudo-distributed mode is where each Hadoop daemon runs in a separate > Java process. This may perform better than non-distributed mode on a > multi-CPU or core machine. > > As to ideas for fixing the issue I am way out of my depth so I will leave > that to somebody else to answer. > > Good Luck > > James > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > Sent: Tuesday, October 16, 2012 11:51 AM > To: [email protected] > Subject: nutch-2.0-fetcher fails in reduce stage > > > > Hello, > > I try to use nutch-2.0, hadoop-1.03, hbase-0.92.1 in pseudo distributed > mode with iptables turned off. As soon as map reaches 100%, fetcher works > for a few minutes and fails with the error > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:328) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:362) > at > org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1045) > at > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:897) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150) > at $Proxy10.getClosestRowBefore(Unknown Source) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:947) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:814) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:788) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1024) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:818) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1524) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409) > at > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:943) > at > org.apache.gora.hbase.store.HBaseTableConnection.close(HBaseTableConnection.java:96) > at > org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:599) > at > org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > org.apache.gora.util.GoraException: > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed setting up > proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to master/ > 192.168.1.4:60020 after attempts=1 > at > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167) > at > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:118) > at > org.apache.gora.mapreduce.GoraOutputFormat.getRecordWriter(GoraOutputFormat.java:88) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:569) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:638) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: > Failed setting up proxy interface > org.apache.hadoop.hbase.ipc.HRegionInterface to master/192.168.1.4:60020after > attempts=1 > at > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:242) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1278) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1235) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1222) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:918) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:814) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:782) > at > org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:249) > at > org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:213) > at > org.apache.hadoop.hbase.catalog.MetaReader.getHTable(MetaReader.java:200) > at > org.apache.hadoop.hbase.catalog.MetaReader.getMetaHTable(MetaReader.java:226) > at > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:700) > at > org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:183) > at > org.apache.hadoop.hbase.catalog.MetaReader.tableExists(MetaReader.java:448) > at > org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:202) > at > org.apache.gora.hbase.store.HBaseStore.schemaExists(HBaseStore.java:164) > at > org.apache.gora.hbase.store.HBaseStore.createSchema(HBaseStore.java:145) > at > org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:130) > at > org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) > at > org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) > ... 10 more > > bin/nutch inject works fine. Also, I have a different linux, box. fetcher > with the same config runs fine, but the generated set is much less than in > the first linux box. > > Any ideas how to fix this issue and what is the benefit running fetcher in > pseudo distributed mode against the local one? > > Thanks. > Alex. > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

