Hi, I'm Using Nutch 2.2.1, Hbase 0.90.6 in pseudo distributed mode , Hadoop 1.2.1, Java 8 Oracle, Intel I5 Quadcore, 16GB Ram
Currently the Fetch cycle is limited by my Internet connection. Parse cycle uses an average of 10% per CPU core Updatedb cycle uses average 3% per CPU core Currently I'm only running Hbase in pseudo distributed, not Nutch. As the DB grows everything slows down significantly but as you can see CPU resources are not used very much, heck during Update DB my web browsing creates higher utilization spikes than the updatedb process. I feel that my hardware is very underutilized and adding more phisycal machines would be a waste. What are the bottlenecks? how can I optimize them? should I run a cluster on 3 Virtual machines? Thank you for any help you can give! Ralf R. Kotowski

