Re: Optimizing Nutch 2.2.1

BlackIce Wed, 19 Mar 2014 11:49:08 -0700

Thank you,

what are some good starting points to start tuning?


thnx


On Tue, Mar 18, 2014 at 8:20 PM, Talat Uyarer <[email protected]> wrote:

> Hi,
>
> When you use Hadoop in pseudo mode, it create 2 map and 2 reduce. If you
> want to speed up some job you should decrease your map and reduce count.
> But optimization is very general concept. You should tune Nutch, Hdfs,
> Jobtracker and Hbase settings.
>
> Good luck ;)
>
>
> 2014-03-18 14:00 GMT+02:00 BlackIce <[email protected]>:
>
> > Hi,
> >
> > I'm Using Nutch 2.2.1, Hbase 0.90.6 in pseudo distributed mode , Hadoop
> > 1.2.1, Java 8 Oracle, Intel I5 Quadcore, 16GB Ram
> >
> > Currently the Fetch cycle is limited by my Internet connection.
> >
> > Parse cycle uses an average of 10% per CPU core
> >
> > Updatedb cycle uses average 3% per CPU core
> >
> > Currently I'm only running Hbase in Speudo distributed, not Nutch.
> >
> > As the DB grows everything slows down significantly but as you can see
> CPU
> > resources are not used very much, heck during Update DB my web browsing
> > creates higher utilization spikes than the updatedb process. I feel that
> my
> > hardware is very underutilized and adding more phisycal machines would
> be a
> > waste.
> >
> > What are the bottlenecks? how can I optimize them? should I run a cluster
> > on 3 Virtual machines?
> >
> > Thank you for any help you can give!
> >
> >
> > Ralf R. Kotowski
> >
>
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>

Re: Optimizing Nutch 2.2.1

Reply via email to