Re: Optimizing Nutch 2.2.1

Talat Uyarer Tue, 18 Mar 2014 12:21:30 -0700

Hi,

When you use Hadoop in pseudo mode, it create 2 map and 2 reduce. If you
want to speed up some job you should decrease your map and reduce count.
But optimization is very general concept. You should tune Nutch, Hdfs,
Jobtracker and Hbase settings.


Good luck ;)


2014-03-18 14:00 GMT+02:00 BlackIce <[email protected]>:

> Hi,
>
> I'm Using Nutch 2.2.1, Hbase 0.90.6 in pseudo distributed mode , Hadoop
> 1.2.1, Java 8 Oracle, Intel I5 Quadcore, 16GB Ram
>
> Currently the Fetch cycle is limited by my Internet connection.
>
> Parse cycle uses an average of 10% per CPU core
>
> Updatedb cycle uses average 3% per CPU core
>
> Currently I'm only running Hbase in Speudo distributed, not Nutch.
>
> As the DB grows everything slows down significantly but as you can see CPU
> resources are not used very much, heck during Update DB my web browsing
> creates higher utilization spikes than the updatedb process. I feel that my
> hardware is very underutilized and adding more phisycal machines would be a
> waste.
>
> What are the bottlenecks? how can I optimize them? should I run a cluster
> on 3 Virtual machines?
>
> Thank you for any help you can give!
>
>
> Ralf R. Kotowski
>



-- 
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304

Re: Optimizing Nutch 2.2.1

Reply via email to