Thank you, what are some good starting points to start tuning?
thnx On Tue, Mar 18, 2014 at 8:20 PM, Talat Uyarer <[email protected]> wrote: > Hi, > > When you use Hadoop in pseudo mode, it create 2 map and 2 reduce. If you > want to speed up some job you should decrease your map and reduce count. > But optimization is very general concept. You should tune Nutch, Hdfs, > Jobtracker and Hbase settings. > > Good luck ;) > > > 2014-03-18 14:00 GMT+02:00 BlackIce <[email protected]>: > > > Hi, > > > > I'm Using Nutch 2.2.1, Hbase 0.90.6 in pseudo distributed mode , Hadoop > > 1.2.1, Java 8 Oracle, Intel I5 Quadcore, 16GB Ram > > > > Currently the Fetch cycle is limited by my Internet connection. > > > > Parse cycle uses an average of 10% per CPU core > > > > Updatedb cycle uses average 3% per CPU core > > > > Currently I'm only running Hbase in Speudo distributed, not Nutch. > > > > As the DB grows everything slows down significantly but as you can see > CPU > > resources are not used very much, heck during Update DB my web browsing > > creates higher utilization spikes than the updatedb process. I feel that > my > > hardware is very underutilized and adding more phisycal machines would > be a > > waste. > > > > What are the bottlenecks? how can I optimize them? should I run a cluster > > on 3 Virtual machines? > > > > Thank you for any help you can give! > > > > > > Ralf R. Kotowski > > > > > > -- > Talat UYARER > Websitesi: http://talat.uyarer.com > Twitter: http://twitter.com/talatuyarer > Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 >

